Create an Account
username: password:
 
  MemeStreams Logo

How to download ALL TED Talks at once, using Bash

search

Lost
Picture of Lost
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

Lost's topics
Arts
Business
Games
Health and Wellness
Home and Garden
Miscellaneous
Current Events
Recreation
Local Information
Science
Society
Sports
Technology

support us

Get MemeStreams Stuff!


 
How to download ALL TED Talks at once, using Bash
Topic: Technology 11:14 pm EDT, Mar  9, 2009

aurynn: ok. here we go: first part: for a in $( seq 1 34 ); do lynx -dump http://www.ted.com/index.php/talks/list/page/$a > $a.html; done

aurynn: it fetches lists of talks - 34 pages of them
stores each page as .html (which is bad, as it is not html)

aurynn: 2nd part: cat *.html | perl -ne 'print if s{^\s*\d+\.\s+(http://www.ted.com/index.php/talks/[^/]+\.html)\s*$}{$1\n}' | sort | uniq > big.list

aurynn: it generates list of all pages of talks.

aurynn: cat big.list | while read URL; do OUTPUT=$( echo $URL | sed 's#.*/#out/#;s#html$#txt#' ); lynx -dump "$URL" > "$OUTPUT"; echo $URL; done

it fetches all pages of talks, and stores them in .txt

for a in *.txt; do grep -q -E '\[[0-9]+\]Video to desktop \(Zipped MP4\)' $a || echo $a; done

this lists all .txt files which don't have link of "video to desktop" (to talks, both seem to be someone singing

i removed these .txt files

finally: for a in *.txt; do POS=$( grep -E '\[[0-9]+\]Video to desktop \(Zipped MP4\)' $a | sed 's/^.*\[//;s/\].*//' ); grep -E "^[[:space:]]*$POS\.[[:space:]]http" $a | sed "s/.*http/wget -O $a.zip http/;s/txt.zip/zip/"; done > runme.sh

this generates runme.sh, which wgets all videos, and stores them in .zip

How to download ALL TED Talks at once, using Bash



 
 
Powered By Industrial Memetics
RSS2.0