Efficient evaluation of nearest-neighbor queries in content-addressable networks

Authors:
Erik Buchmann;Klemens Böhm
Affiliations:
University of Magdeburg, Germany;University of Karlsruhe, Germany
Venue:
From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments
Year:
2005

Citing 8
Cited 0

LH: Linear Hashing for distributed files

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Locality-preserving hashing in multidimensional spaces

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
How people revisit web pages: empirical findings and implications for the design of history systems

International Journal of Human-Computer Studies - Special issue: World Wide Web usability
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Complex Queries in DHT-based Peer-to-Peer Networks

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg

Quantified Score

Hi-index	0.00

Visualization

Abstract

Content-Addressable Networks (CAN) are able to manage huge sets of (key,value)-pairs and cope with very high workloads. They follow the peer-to-peer (P2P) paradigm in order to build scalable, distributed data structures on top of the Internet. CAN are designed to drive Internet-scale applications like distributed search engines, multimedia retrieval systems and more. In these scenarios, the nearest-neighbor (NN) query model is very natural: the user specifies a query key, and the engine responds with the set of query results closest to the key. Implementing NN queries in CAN is challenging. As with any P2P system, global knowledge about the peers responsible for parts of the query result is not available, and the communication overhead is the most critical factor. In this paper, we present our approach to realize efficient NN queries in CAN. We evaluate our NN query processing scheme by experiments with a CAN implementation in a setting derived from web applications. The results of our experiments with 10.000 peers are positive: even large result sets with a precision of 75% can be obtained by invoking less than 1.6 peers on average. In addition, our NN protocol is suitable for prefetching in settings with sequences of consecutive queries for similar keys.