Create an Account
username: password:
 
  MemeStreams Logo

Link to AOL data release

search

Rattle
Picture of Rattle
Rattle's Pics
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

Rattle's topics
Arts
  Literature
   Sci-Fi/Fantasy Literature
  Movies
  Music
Business
  Tech Industry
  Telecom Industry
Games
Health and Wellness
Holidays
Miscellaneous
  Humor
  MemeStreams
   Using MemeStreams
Current Events
  War on Terrorism
  Elections
Recreation
  Travel
Local Information
  SF Bay Area
   SF Bay Area News
Science
  Biology
  History
  Nano Tech
  Physics
  Space
Society
  Economics
  Futurism
  International Relations
  Politics and Law
   Civil Liberties
    Internet Civil Liberties
    Surveillance
   Intellectual Property
  Media
   Blogging
  Military
  Security
Sports
Technology
  Biotechnology
  Computers
   Computer Security
    Cryptography
   Cyber-Culture
   PC Hardware
   Computer Networking
   Macintosh
   Linux
   Software Development
    Open Source Development
    Perl Programming
    PHP Programming
   Spam
   Web Design
  Military Technology
  High Tech Developments

support us

Get MemeStreams Stuff!


 
Link to AOL data release
Topic: Surveillance 2:02 am EDT, Aug  7, 2006

Unbelievable. AOL released a file containing the search engine queries of over 500,000 users during a three month period. It's being mirrored all over.

Here is a screenshot of the download page before it was taken down, complete with a spelling error.. "ananomized"

Update: I've imported the data into an SQL database so I can do some data mining. It's about 3.5G worth of SQL, so the process of building indexes and performing any useful queries is really slow going. Sometime in the next 24 hours, I should be posting up some statistics. I have to think about it some more first... From what I've gathered so far, there is no liability in doing so.

AOL fucked up. This data is in the hands of many, many, many people. That being the case, I want to see how the data frames the issues we all have with this kind of data being available to law enforcement, marketers, and others.. Anyone who has any ideas about what questions we should be asking, reply to this with your thoughts.

Since the hot button issue most directly connected with this is child porn, I've been doing some research focusing on that. The Justice Department wanted Google and other search engines to hand over exactly this information so they could build a profile of what people are searching for when they search for child porn. I've been attempting to do the same thing. Thus far, I've gotten a pretty expansive table of users (over 300) that have been blatantly searching for child porn. I've done a fair amount of work eliminating false positives, such as people searching for information about how to protect their kids, researching court cases, or looking up information about specific offenses. I've tried to limit the list to people blatantly repeatedly searching for illegal pictures of pre-teens and whatnot. I'm working on constructing a list of "what people who search for kiddie porn search for."

I also have some indexes building that will allow me to mine general statistical data on what the top queries are and stuff like that. Since I'm working with a laptop that only has a gig of ram and not too speediest of a hard drive, it's going to take awhile. I expect my machine to be churning for the next few hours.

Update: I don't have powerful enough hardware to mine this. I'm waiting on more resources to become available later tonight.

Link to AOL data release



 
 
Powered By Industrial Memetics
RSS2.0