Aggregation of a term vocabulary for P2P-IRtest: a DHT stress test

Authors:
Fabius Klemm;Karl Aberer
Affiliations:
School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland;School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Venue:
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Year:
2005

Citing 8
Cited 6

The small-world phenomenon: an algorithmic perspective

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Mercury: supporting scalable multi-attribute range queries

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Efficient peer-to-peer keyword searching

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Scribe: a large-scale and decentralized application-level multicast infrastructure

IEEE Journal on Selected Areas in Communications

Global term weights in distributed environments

Information Processing and Management: an International Journal
Full-text indexing and information retrieval in P2P systems

Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
Managing collaborative feedback information for distributed retrieval

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing

Computer Networks: The International Journal of Computer and Telecommunications Networking
HAPS: supporting effective and efficient full-text P2P search with peer dynamics

Journal of Computer Science and Technology
Collaborative ranking and profiling: exploiting the wisdom of crowds in tailored web search

DAIS'10 Proceedings of the 10th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been an increasing research interest in developing full-text retrieval based on peer-to-peer (P2P) technology. So far, these research efforts have largely concentrated on efficiently distributing an index. However, ranking of the results retrieved from the index is a crucial part in information retrieval. To determine the relevance of a document to a query, ranking algorithms use collection-wide statistics. Term frequency - inverse document frequency (TF-IDF), for example, is based on frequencies of documents containing a given term in the whole collection. Such global frequencies are not readily available in a distributed system. In this paper, we study the feasibility of aggregating global frequencies for a large term vocabulary in a P2P setting. We use a distributed hash table (DHT) for our analysis. Traditional applications of DHTs, such as file sharing, index keys in the order of tens of thousands. Aggregation of a vocabulary consisting of millions of terms poses extreme requirements to a DHT implementation. We study different aggregation strategies and propose optimizations to DHTs to efficiently process large numbers of keys.