Create an Account
username: password:
 
  MemeStreams Logo

MemeStreams Discussion

search


This page contains all of the posts and discussion on MemeStreams referencing the following web page: Link to AOL data release. You can find discussions on MemeStreams as you surf the web, even if you aren't a MemeStreams member, using the Threads Bookmarklet.

Link to AOL data release
by Decius at 9:33 am EDT, Aug 7, 2006

Unbelievable. AOL released a file containing the search engine queries of over 500,000 users during a three month period. It's being mirrored all over.

Here is a screenshot of the download page before it was taken down, complete with a spelling error.. "ananomized"

This will probably be a watershed moment for Internet privacy.


Link to AOL data release
by skullaria at 12:39 am EDT, Aug 7, 2006

Aol released TONS of user information in the form of search engine queries, and whether or not links were clicked on.

While they did obliterate the name, replacing it with a number, anyone that has egosurfed is clearly at risk..

So...what am I going to do? I'm going to post a link to the file.

AOL has taken the original file down. This one sprung up shortly after.


Link to AOL data release
by Rattle at 2:02 am EDT, Aug 7, 2006

Unbelievable. AOL released a file containing the search engine queries of over 500,000 users during a three month period. It's being mirrored all over.

Here is a screenshot of the download page before it was taken down, complete with a spelling error.. "ananomized"

Update: I've imported the data into an SQL database so I can do some data mining. It's about 3.5G worth of SQL, so the process of building indexes and performing any useful queries is really slow going. Sometime in the next 24 hours, I should be posting up some statistics. I have to think about it some more first... From what I've gathered so far, there is no liability in doing so.

AOL fucked up. This data is in the hands of many, many, many people. That being the case, I want to see how the data frames the issues we all have with this kind of data being available to law enforcement, marketers, and others.. Anyone who has any ideas about what questions we should be asking, reply to this with your thoughts.

Since the hot button issue most directly connected with this is child porn, I've been doing some research focusing on that. The Justice Department wanted Google and other search engines to hand over exactly this information so they could build a profile of what people are searching for when they search for child porn. I've been attempting to do the same thing. Thus far, I've gotten a pretty expansive table of users (over 300) that have been blatantly searching for child porn. I've done a fair amount of work eliminating false positives, such as people searching for information about how to protect their kids, researching court cases, or looking up information about specific offenses. I've tried to limit the list to people blatantly repeatedly searching for illegal pictures of pre-teens and whatnot. I'm working on constructing a list of "what people who search for kiddie porn search for."

I also have some indexes building that will allow me to mine general statistical data on what the top queries are and stuff like that. Since I'm working with a laptop that only has a gig of ram and not too speediest of a hard drive, it's going to take awhile. I expect my machine to be churning for the next few hours.

Update: I don't have powerful enough hardware to mine this. I'm waiting on more resources to become available later tonight.


Link to AOL data release
by Acidus at 7:57 am EDT, Aug 7, 2006

Unbelievable. AOL released a file containing the search engine queries of over 500,000 users during a three month period. It's being mirrored all over.

Here is a screenshot of the download page before it was taken down, complete with a spelling error.. "ananomized"


There is a redundant post from noteworthy not displayed in this view.
 
 
Powered By Industrial Memetics