| |
Current Topic: Technology |
|
Hyperspaces for Object Clustering and Approximate Matching in Peer-to-Peer Overlays |
|
|
Topic: Technology |
7:26 am EDT, Aug 12, 2008 |
Existing distributed hash tables provide efficient mechanisms for storing and retrieving a data item based on an exact key, but are unsuitable when the search key is similar, but not identical, to the key used to store the data item. In this paper, we present a scalable and efficient peer-to-peer system with a new search primitive that can efficiently find the k data items with keys closest to the search key. The system works via a novel assignment of virtual coordinates to each object in a high-dimensional, synthetic space such that the proximity between two points in the coordinate space is correlated with the similarity between the strings that the points represent. We examine the feasibility of this approach for efficient, peer-to-peer search on inexact string keys, and show that the system provides a robust method to handle key perturbations that naturally occur in applications, such as file-sharing networks, where the query strings are provided by users.
Hyperspaces for Object Clustering and Approximate Matching in Peer-to-Peer Overlays |
|
There's a Lot of It About: And everybody's doing it. |
|
|
Topic: Technology |
9:40 am EDT, Aug 1, 2008 |
Stan Kelly-Bootle: Fads and faddisms come and go thick and fast; fashions, thin and thinner, in a snowclone of clichés. In C++ terms: white = new black; purple = new white; hiphop = new rock_and_roll; small = new big; subprime = new affordable; michigan = new florida; C# = new C++;
Thus, wisdom-free information is not just here-and-there and now-and-then but all-over, all-the-time.
There's a Lot of It About: And everybody's doing it. |
|
FlexRecs: Expressing and Combining Flexible Recommendations |
|
|
Topic: Technology |
7:20 am EDT, Jul 29, 2008 |
Recommendation systems are popping up everywhere due to the abundance of their practical applications. However, most recommendation methods are "hard-wired" into the system and they support only predefined and fixed recommendations, which may not always capture the real-time user information needs. In this paper, we propose FlexRecs, a framework for flexible recommendations over relational data. With FlexRecs, a given recommendation approach can be expressed as a high-level workflow. The workflow may contain traditional relational operators such as select, project and join, but in addition, it may contain new recommendation operators that generate or combine recommendations. The workflows can easily represent both content-based and collaborative recommendation approaches, as well as new types of recommendations. Furthermore, we describe a prototype system for processing FlexRecs workflows on top of a relational database, which is used as part of a course planning tool. Finally, we present experimental results from a preliminary performance evaluation of the working system. They show that it is easy to create novel workflows with FlexRecs and that system performance is reasonable even for complex workflows.
FlexRecs: Expressing and Combining Flexible Recommendations |
|
BrowseRank: Letting Web Users Vote for Page Importance |
|
|
Topic: Technology |
7:20 am EDT, Jul 29, 2008 |
Microsoft discovers MemeStreams. This paper proposes a new method for computing page importance, referred to as BrowseRank. The conventional approach to compute page importance is to exploit the link graph of the web and to build a model based on that graph. For instance, PageRank is such an algorithm, which employs a discrete-time Markov process as the model. Unfortunately, the link graph might be incomplete and inaccurate with respect to data for determining page importance, because links can be easily added and deleted by web content creators. In this paper, we propose computing page importance by using a ’user browsing graph’ created from user behavior data. In this graph, vertices represent pages and directed edges represent transitions between pages in the users’ web browsing history. Furthermore, the lengths of staying time spent on the pages by users are also included. The user browsing graph is more reliable than the link graph for inferring page importance. This paper further proposes using the continuous-time Markov process on the user browsing graph as a model and computing the stationary probability distribution of the process as page importance. An efficient algorithm for this computation has also been devised. In this way, we can leverage hundreds of millions of users’ implicit voting on page importance. Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.
BrowseRank: Letting Web Users Vote for Page Importance |
|
Topic: Technology |
6:44 am EDT, Jul 16, 2008 |
The Social Graph is a misleading distraction, a handy buzzword we can all slip into our cocktail conversations. But the real value is in the personal, independent social graph we all have. Plural. If you think about it, that’s the only way you can really make sense of it in our user-centric, user-driven world.
Joe Andrieu should see the graphs at MemeStreams; maybe Jello is right about sharing the graphs with ET. It's less about the choice of layout and more about the conceptualization. Social Graph is Plural |
|
Engineers' Dreams, by George Dyson |
|
|
Topic: Technology |
6:44 am EDT, Jul 16, 2008 |
Only one third of a search engine is devoted to fulfilling search requests. The other two thirds are divided between crawling (sending a host of single-minded digital organisms out to gather information) and indexing (building data structures from the results). Ed's job was to balance the resulting loads. When Ed examined the traffic, he realized that Google was doing more than mapping the digital universe. Google doesn't merely link or point to data. It moves data around. Data that are associated frequently by search requests are locally replicated—establishing physical proximity, in the real universe, that is manifested computationally as proximity in time. Google was more than a map. Google was becoming something else. ...
Engineers' Dreams, by George Dyson |
|
Beyond Node Degree: Evaluating AS Topology Models |
|
|
Topic: Technology |
7:27 am EDT, Jul 15, 2008 |
Many models have been proposed to generate Internet Autonomous System (AS) topologies, most of which make structural assumptions about the AS graph. In this paper we compare AS topology generation models with several observed AS topologies. In contrast to most previous works, we avoid making assumptions about which topological properties are important to characterize the AS topology. Our analysis shows that, although matching degree-based properties, the existing AS topology generation models fail to capture the complexity of the local interconnection structure between ASs. Furthermore, we use BGP data from multiple vantage points to show that additional measurement locations significantly affect local structure properties, such as clustering and node centrality. Degree-based properties, however, are not notably affected by additional measurements locations. These observations are particularly valid in the core. The shortcomings of AS topology generation models stems from an underestimation of the complexity of the connectivity in the core caused by inappropriate use of BGP data.
Beyond Node Degree: Evaluating AS Topology Models |
|
Topic: Technology |
7:09 am EDT, Jul 14, 2008 |
In contrast with most internet topology measurement research, our concern here is not to obtain a map as complete and precise as possible of the whole internet. Instead, we claim that each machine's view of this topology, which we call ego-centered view, is an object worth of study in itself. We design and implement an ego-centered measurement tool, and perform radar-like measurements consisting of repeated measurements of such views of the internet topology. We conduct long-term (several weeks) and high-speed (one round every few minutes) measurements of this kind from more than one hundred monitors, and we provide the obtained data. We also show that these data may be used to detect events in the dynamics of internet topology.
A Radar for the Internet |
|
Draft Guidelines on Cell Phone and PDA Security |
|
|
Topic: Technology |
6:43 am EDT, Jul 10, 2008 |
Cell phones and personal digital assistants (PDAs) have become indispensable tools for today's highly mobile workforce. Small and relatively inexpensive, these devices can be used for many functions, including sending and receiving email, storing documents, delivering presentations, and remotely accessing data. While these devices provide productivity benefits, they also pose new risks to an organization’s security. This document provides an overview of cell phone and PDA devices in use today and offers insights into making informed information technology security decisions on their treatment. The document gives details about the threats and technology risks associated with these devices and the available safeguards to mitigate them. Organizations can use this information to enhance security and reduce incidents involving handheld devices.
Draft Guidelines on Cell Phone and PDA Security |
|
The Correspondence Analysis Platform for Uncovering Deep Structure in Data and Information |
|
|
Topic: Technology |
7:04 am EDT, Jul 9, 2008 |
We study two aspects of information semantics: (i) the collection of all relationships, (ii) tracking and spotting anomaly and change. The first is implemented by endowing all relevant information spaces with a Euclidean metric in a common projected space. The second is modelled by an induced ultrametric. A very general way to achieve a Euclidean embedding of different information spaces based on cross-tabulation counts (and from other input data formats) is provided by Correspondence Analysis. From there, the induced ultrametric that we are particularly interested in takes a sequential - e.g. temporal - ordering of the data into account. We employ such a perspective to look at narrative, "the flow of thought and the flow of language" (Chafe). In application to policy decision making, we show how we can focus analysis in a small number of dimensions.
The Correspondence Analysis Platform for Uncovering Deep Structure in Data and Information |
|