Create an Account
username: password:
 
  MemeStreams Logo

How To Break Anonymity of the Netflix Prize Dataset

search

possibly noteworthy
Picture of possibly noteworthy
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

possibly noteworthy's topics
Arts
Business
Games
Health and Wellness
Home and Garden
Miscellaneous
  Humor
Current Events
  War on Terrorism
Recreation
Local Information
  Food
Science
Society
  International Relations
  Politics and Law
   Intellectual Property
  Military
Sports
Technology
  Military Technology
  High Tech Developments

support us

Get MemeStreams Stuff!


 
How To Break Anonymity of the Netflix Prize Dataset
Topic: Intellectual Property 5:25 pm EST, Nov 26, 2007

Anonymity is Hard.

We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge.

We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.

See also:

Hushmail Spills it to Feds

AOL Search Database

Why Information Security is Hard

Don Kerr, on Anonymity and Privacy

Seeing Corporate Fingerprints in Wikipedia Edits

WikiScanner on the Colbert Report

How To Break Anonymity of the Netflix Prize Dataset



 
 
Powered By Industrial Memetics
RSS2.0