MemeStreams | MemeStreams Discussion

Create an Account

This page contains all of the posts and discussion on MemeStreams referencing the following web page: How to download ALL TED Talks at once, using Bash. You can find discussions on MemeStreams as you surf the web, even if you aren't a MemeStreams member, using the Threads Bookmarklet.

How to download ALL TED Talks at once, using Bash by Lost at 11:14 pm EDT, Mar 9, 2009
aurynn: ok. here we go: first part: for a in $( seq 1 34 ); do lynx -dump http://www.ted.com/index.php/talks/list/page/$a > $a.html; done aurynn: it fetches lists of talks - 34 pages of them stores each page as .html (which is bad, as it is not html) aurynn: 2nd part: cat .html \| perl -ne 'print if s{^\s\d+\.\s+(http://www.ted.com/index.php/talks/[^/]+\.html)\s$}{$1\n}' \| sort \| uniq > big.list aurynn: it generates list of all pages of talks. aurynn: cat big.list \| while read URL; do OUTPUT=$( echo $URL \| sed 's#./#out/#;s#html$#txt#' ); lynx -dump "$URL" > "$OUTPUT"; echo $URL; done it fetches all pages of talks, and stores them in .txt for a in .txt; do grep -q -E '\[[0-9]+\]Video to desktop $Zipped MP4$' $a \|\| echo $a; done this lists all .txt files which don't have link of "video to desktop" (to talks, both seem to be someone singing i removed these .txt files finally: for a in .txt; do POS=$( grep -E '\[[0-9]+\]Video to desktop $Zipped MP4$' $a \| sed 's/^.\[//;s/\].//' ); grep -E "^[[:space:]]$POS\.[[:space:]]http" $a \| sed "s/.http/wget -O $a.zip http/;s/txt.zip/zip/"; done > runme.sh this generates runme.sh, which wgets all videos, and stores them in .zip
Link - Reply