Create an Account
username: password:
 
  MemeStreams Logo

Clustering Billions of Images with Large Scale Nearest Neighbor Search

search

noteworthy
Picture of noteworthy
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

noteworthy's topics
Arts
  Literature
   Fiction
   Non-Fiction
  Movies
   Documentary
   Drama
   Film Noir
   Sci-Fi/Fantasy Films
   War
  Music
  TV
   TV Documentary
Business
  Tech Industry
  Telecom Industry
  Management
Games
Health and Wellness
Home and Garden
Miscellaneous
  Humor
  MemeStreams
   Using MemeStreams
Current Events
  War on Terrorism
  Elections
  Israeli/Palestinian
Recreation
  Cars and Trucks
  Travel
   Asian Travel
Local Information
  Food
  SF Bay Area Events
Science
  History
  Math
  Nano Tech
  Physics
  Space
Society
  Economics
  Education
  Futurism
  International Relations
  History
  Politics and Law
   Civil Liberties
    Surveillance
   Intellectual Property
  Media
   Blogging
  Military
  Philosophy
Sports
Technology
  Biotechnology
  Computers
   Computer Security
    Cryptography
   Human Computer Interaction
   Knowledge Management
  Military Technology
  High Tech Developments

support us

Get MemeStreams Stuff!


 
Clustering Billions of Images with Large Scale Nearest Neighbor Search
Topic: Technology 3:06 pm EST, Feb 19, 2007

Looks like this paper is not yet in IEEE Explore. So you can only buy it from the Computer Society. Here's the abstract:

The proliferation of the web and digital photography have made large scale image collections containing billions of images a reality. Image collections on this scale make performing even the most common and simple computer vision, image processing, and machine learning tasks non-trivial. An example is nearest neighbor search, which not only serves as a fundamental subproblem in many more sophisticated algorithms, but also has direct applications, such as image retrieval and image clustering. In this paper, we address the nearest neighbor problem as the first step towards scalable image processing. We describe a scalable version of an approximate nearest neighbor search algorithm and discuss how it can be used to find near duplicates among over a billion images.

Found via browsing after pointer to Google's Analysis of Disk Failures.

You probably won't pay $19 to read this paper. However, the lead author's thesis covers the same territory:

Spill-tree is designed for approximate knn search. By adapting metric-trees to a more flexible data structure, spill-tree is able to adapt to the distribution of data and it scales well even for huge high-dimensional data sets. Significant efficiency improvement has been observed comparing to LSH (localify sensitive hashing), the state of art approximate knn algorithm. We applied spill-tree to three real-world applications: shot video segmentation, drug activity detection and image clustering, which I will explain in the thesis.

Her lab page also offers additional resources, including a survey of approximate nearest-neighbor algorithms and a more recent study on autonomous visualization. That's more for the life sciences, but still quite interesting.

Note that the now-at-Google Ting Liu is not to be confused with the Ting Liu at Princeton, who interned with Kevin Fall at Intel Research in Berkeley. She works on DTNs in sensor networks.

One of the other co-authors, Henry Rowley, has recently directly addressed the question of the day; well, that's overstating it, but this is likely (part of) the technology behind SafeSearch. (He's also on his way to breaking the hot-or-not captcha.)

Clustering Billions of Images with Large Scale Nearest Neighbor Search



 
 
Powered By Industrial Memetics
RSS2.0