MemeStreams | Twice Filtered

Thanks to the Internet, there is unprecedented access to sociological data. And thanks to computers, sociologists are better able to sift through that data, find trends, and test models.
At Microsoft, Smith uses public Internet data to look at the social phenomenon of online communities, and he tries to make them better for people and better for business. He recently gave a presentation regarding his work at Microsoft's TechFest in Redmond, WA, an annual event at which Microsoft researchers from around the world share their latest work. Technology Review caught up with Smith to ask him about the field of cybersociology.

Sociology at Microsoft

Systematic Topology Analysis and Generation Using Degree Correlations

Topic: Technology

4:29 pm EST, Mar 3, 2007

Researchers have proposed a variety of metrics to measure important graph properties, for instance, in social, biological, and computer networks. Values for a particular graph metric may capture a graph's resilience to failure or its routing efficiency. Knowledge of appropriate metric values may influence the engineering of future topologies, repair strategies in the face of failure, and understanding of fundamental properties of existing networks. Unfortunately, there are typically no algorithms to generate graphs matching one or more proposed metrics and there is little understanding of the relationships among individual metrics or their applicability to different settings.
We present a new, systematic approach for analyzing network topologies.
We hope that a systematic method to analyze and synthesize topologies offers a significant improvement to the set of tools available to network topology and protocol researchers.

Systematic Topology Analysis and Generation Using Degree Correlations

RFC 3439 Some Internet Architectural Guidelines and Philosophy

Topic: Technology

4:18 pm EST, Mar 3, 2007

The Amplification Principle states that there are non-linearities which occur at large scale which do not occur at small to medium scale.
COROLLARY: In many large networks, even small things can and do cause huge events. In system-theoretic terms, in large systems such as these, even small perturbations on the input to a process can destabilize the system's output.
An important example of the Amplification Principle is non-linear resonant amplification, which is a powerful process that can transform dynamic systems, such as large networks, in surprising ways with seemingly small fluctuations. These small fluctuations may slowly accumulate, and if they are synchronized with other cycles, may produce major changes. Resonant phenomena are examples of non-linear behavior where small fluctuations may be amplified and have influences far exceeding their initial sizes. The natural world is filled with examples of resonant behavior that can produce system- wide changes ...
In the Internet domain, it has been shown that increased inter-connectivity results in more complex and often slower BGP routing convergence.

RFC 3439 Some Internet Architectural Guidelines and Philosophy

SIGCOMM 2007 Workshop 'IPv6 and the Future of the Internet'

Topic: Technology

10:43 pm EST, Feb 27, 2007

Topics of interest include, but are not limited to:
# Advantages and challenges the very large IPv6 address space bring to the Internet routing system
# Scalable and robust solutions to multi-homing and traffic engineering
# Host and Network Mobility
# Multicast and Anycast protocols
# Worms, DoS, and other security threats in IPv6 networks and possible enhancements to address these challenges.
# IPv6's Applicability to sensor networks, low-power personal area networks, and other types of challenged networks
# A critical assessment of IPv6's viability as a global communication infrastructure for the future or of its fundamental limitations, if any.

SIGCOMM 2007 Workshop 'IPv6 and the Future of the Internet'

Heilmeier's Catechism

Topic: Technology

8:37 pm EST, Feb 22, 2007

* What are you trying to do? Articulate your objectives using absolutely no jargon.
* How is it done today, and what are the limits of current practice?
* What's new in your approach and why do you think it will be successful?
* Who cares? If you're successful, what difference will it make?
* What are the risks and the payoffs?
* How much will it cost? How long will it take?
* What are the midterm and final "exams" to check for success?

See also this interview in Business Week from 2005:

Successes usually don't teach you as much as failures. They just confirm that what you already knew was right.

Heilmeier's Catechism

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Topic: Technology

3:06 pm EST, Feb 19, 2007

Looks like this paper is not yet in IEEE Explore. So you can only buy it from the Computer Society. Here's the abstract:

The proliferation of the web and digital photography have made large scale image collections containing billions of images a reality. Image collections on this scale make performing even the most common and simple computer vision, image processing, and machine learning tasks non-trivial. An example is nearest neighbor search, which not only serves as a fundamental subproblem in many more sophisticated algorithms, but also has direct applications, such as image retrieval and image clustering. In this paper, we address the nearest neighbor problem as the first step towards scalable image processing. We describe a scalable version of an approximate nearest neighbor search algorithm and discuss how it can be used to find near duplicates among over a billion images.

Found via browsing after pointer to Google's Analysis of Disk Failures.

You probably won't pay $19 to read this paper. However, the lead author's thesis covers the same territory:

Spill-tree is designed for approximate knn search. By adapting metric-trees to a more flexible data structure, spill-tree is able to adapt to the distribution of data and it scales well even for huge high-dimensional data sets. Significant efficiency improvement has been observed comparing to LSH (localify sensitive hashing), the state of art approximate knn algorithm. We applied spill-tree to three real-world applications: shot video segmentation, drug activity detection and image clustering, which I will explain in the thesis.

Her lab page also offers additional resources, including a survey of approximate nearest-neighbor algorithms and a more recent study on autonomous visualization. That's more for the life sciences, but still quite interesting.

Note that the now-at-Google Ting Liu is not to be confused with the Ting Liu at Princeton, who interned with Kevin Fall at Intel Research in Berkeley. She works on DTNs in sensor networks.

One of the other co-authors, Henry Rowley, has recently directly addressed the question of the day; well, that's overstating it, but this is likely (part of) the technology behind SafeSearch. (He's also on his way to breaking the hot-or-not captcha.)

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Efficient Near-duplicate Detection and Sub-image Retrieval

Topic: Technology

2:54 pm EST, Feb 19, 2007

We have the technology.

We introduce a system for near-duplicate detection and sub-image retrieval. Such a system is useful for finding copyright violations and detecting forged images. We define near-duplicates as images altered with common transformations such as changing contrast, saturation, scaling, cropping, framing, etc. Our system builds a parts-based representation of images using distinctive local descriptors which give high quality matches even under severe transformations. To cope with the large number of features extracted from the images, we employ locality-sensitive hashing to index the local descriptors. This allows us to make approximate similarity queries that only examine a small fraction of the database. Although locality-sensitive hashing has excellent theoretical performance properties, a standard implementation would still be unacceptably slow for this application. We show that, by optimizing layout and access to the index data on disk, we can efficiently query indices containing millions of keypoints.
Our system achieves near-perfect accuracy (100% precision at 99.85% recall) on the tests presented in Meng et al. [16], and consistently strong results on our own, significantly more challenging experiments. Query times are interactive even for collections of thousands of images.

Efficient Near-duplicate Detection and Sub-image Retrieval

rendezvoo - spread the word about what you love

Topic: Technology

10:45 am EST, Feb 16, 2007

Spread the word about what you love! Rendezvoo is the user community where you can spread the word about the things you love -- your portfolio, your music, your websites, your best blog posts, and everything else you want to share. Post for free, and for additional exposure, pay for premium placement with Promote Now! Then, come back to discover great new things that friends, companies, groups, and the rest of the community want to spread. It's all here, which hands-down makes Rendezvoo the best place on the web to start the word-of-mouth process.

rendezvoo - spread the word about what you love

Sound of Traffic

Topic: Technology

10:23 pm EST, Feb 15, 2007

Sound of Traffic is a Java "application" which converts TCP/IP header information into midi notes via the Java Synthesizer. The purpose is to listen in on network traffic in ordered time, via a tempo, rather than realtime, which could be more chaotic. In this sense it becomes closer to music then noise.
Play back of traffic is sorted by source and destination addresses and ports. Ports are assigned individual midi instruments and played on odd or even ticks depending upon whether it is a source or destination packet. The note played by the port is based upon the number of hits (amount of traffic) occurring on the port.
Development is on hold while I develop a new package for converting numeric data from any data stream into audio (MIDI, Sampled, FM Modulation.)

Sound of traffic is kinda neat.

Sound of Traffic

YouTube - Introducing the book

Topic: Technology

9:00 pm EST, Feb 15, 2007

This video makes fun of modern newbie computer users. It's from a show called Oystein & Meg (Oystein & I) produced by the Norwegian Broadcasting television channel (NRK) in 2001. The spoken language is Norwegian

YouTube - Introducing the book

(Last) Newer << 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 >> Older (First)