Create an Account
username: password:
 
  MemeStreams Logo

MemeStreams Discussion

search


This page contains all of the posts and discussion on MemeStreams referencing the following web page: How to download ALL TED Talks at once, using Bash. You can find discussions on MemeStreams as you surf the web, even if you aren't a MemeStreams member, using the Threads Bookmarklet.

How to download ALL TED Talks at once, using Bash
by Lost at 11:14 pm EDT, Mar 9, 2009

aurynn: ok. here we go: first part: for a in $( seq 1 34 ); do lynx -dump http://www.ted.com/index.php/talks/list/page/$a > $a.html; done

aurynn: it fetches lists of talks - 34 pages of them
stores each page as .html (which is bad, as it is not html)

aurynn: 2nd part: cat *.html | perl -ne 'print if s{^\s*\d+\.\s+(http://www.ted.com/index.php/talks/[^/]+\.html)\s*$}{$1\n}' | sort | uniq > big.list

aurynn: it generates list of all pages of talks.

aurynn: cat big.list | while read URL; do OUTPUT=$( echo $URL | sed 's#.*/#out/#;s#html$#txt#' ); lynx -dump "$URL" > "$OUTPUT"; echo $URL; done

it fetches all pages of talks, and stores them in .txt

for a in *.txt; do grep -q -E '\[[0-9]+\]Video to desktop \(Zipped MP4\)' $a || echo $a; done

this lists all .txt files which don't have link of "video to desktop" (to talks, both seem to be someone singing

i removed these .txt files

finally: for a in *.txt; do POS=$( grep -E '\[[0-9]+\]Video to desktop \(Zipped MP4\)' $a | sed 's/^.*\[//;s/\].*//' ); grep -E "^[[:space:]]*$POS\.[[:space:]]http" $a | sed "s/.*http/wget -O $a.zip http/;s/txt.zip/zip/"; done > runme.sh

this generates runme.sh, which wgets all videos, and stores them in .zip


 
 
Powered By Industrial Memetics