During a course on Coursera I wanted to keep the Jupyter Notebook content for later reference and download it with all files to my own computer.
There are a lot of suggestions on how to download Jupyter Notbooks and files online but none of them really helped for this course as there were a lot of symbolic links that I wanted to download as well. For most of the solutions I found online the content for the symbolic links in the Jupyter Notebooks simply downloaded as references and not the files themselves.
So after a bit of research I came up with this. This worked for me and downloaded everything including symlinked content.
This is for the course Convolutional Neural Networks on Coursera. Note that Third and Fourth cell will not be needed in most cases. But in the course Sequence Models it is needed to remove some symlinks that links to themselves thus creating an infinite loop.
Here is what to do to download all Jupyter Notebooks and all files including symlinked content:
First create a directory named utils Then create a notebook in that directory and run the following cells (replace the names of the folders you want to copy)
First cell
# make directories for downloads !mkdir downloads !mkdir downloads_zip
Second cell
# copy files !cp ../week1 downloads/week1 -R !cp ../week2 downloads/week2 -R !cp ../week3 downloads/week3 -R !cp ../week4 downloads/week4 -R !cp ../dummy downloads/dummy -R !cp ../readonly downloads/readonly -R
Third cell
# list symbolic links pointing to itself (if there is nothing here there is no need to run the next cell) !cd downloads/ && find ./ -type l -exec sh -c 'readlink -f "{}" 1>/dev/null || rm "{}"' -- "{}" \;
Fourth cell
# remove symbolic links pointing to itself (if any) !cd downloads/ && find ./ -type l -exec sh -c 'readlink -f "{}" 1>/dev/null || rm "{}"' -- "{}" \;
Fifth cell
# copy files again and this time also download symlinks content !tar -hcf - downloads | tar -xf - -C downloads_zip
Sixth cell
# gz the files !tar cvfz downloads_zip.tar.gz \ 'downloads_zip' \
Seventh cell
# split the files in order to be able to download them !split -b 200m downloads_zip.tar.gz downloads_zip.tar.gz.part.
Eighth cell
# run this in your terminal on your local machine in the folder you downloaded the gzip parts (might be different on Windows - this works for Mac) # cat downloads_zip.tar.gz.part.* > downloads_zip.tar.gz
Ninth cell
# clean up !rm -rf downloads/ !rm -rf downloads_zip/ !rm -rf downloads_zip.tar.gz # repeat the above line for all parts as well
You showed the example for course 4 right? I tried the 2nd step but it says directory doesn’t exist… But i checked on jupyter..the folders do exist. Any idea what’s wrong?
Thanks for the useful advice.
I managed to download and unzip all files to my local PC following your instructions. However, when I upload one of the notebooks to jupyter notebook, the embedded images are not shown. Do you have any ideas what are the reasons?
I’m not getting this error at the download step:
`HTTP chunk size exceeds the configured limit of 1048576 bytes`
Any work around?
*now
I would try to write Coursera’s tech support and ask if this is intentional. A quick search on Google suggests that this is a setting in the Akka-Http Client. You might be able to ask if they could change ‘maxChunkSize’ to something higher. That would probably make the script work again.