Peer-to-peer similarity search over widely distributed document collections

Authors:
Christos Doulkeridis;Kjetil Nørvåg;Michalis Vazirgiannis
Affiliations:
AUEB, AThens, Greece;NTNU, Trondheim, Norway;AUEB, Athens, Greece
Venue:
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Year:
2008

Citing 20
Cited 4

Concept decompositions for large sparse text data using clustering

Machine Learning
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks

WWW '03 Proceedings of the 12th international conference on World Wide Web
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Remindin': semantic query routing in peer-to-peer networks based on social metaphors

Proceedings of the 13th international conference on World Wide Web
WonGoo: A Pure Peer-to-Peer Full Text Information Retrieval System Based On Semantic Overlay Networks

NCA '04 Proceedings of the Network Computing and Applications, Third IEEE International Symposium
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
M-Chord: a scalable distributed similarity search structure

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Full-text federated search of text-based digital libraries in peer-to-peer networks

Information Retrieval
Efficient peer-to-peer semantic overlay networks based on statistical language models

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Optimizing Peer Relationships in a Super-Peer Network

ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Peer-to-peer similarity search in metric spaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Processing complex similarity queries in peer-to-peer networks

Proceedings of the 2008 ACM symposium on Applied computing
Query-driven indexing for scalable peer-to-peer text retrieval

Proceedings of the 2nd international conference on Scalable information systems
MINERVA∞: a scalable efficient peer-to-peer search engine

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Content-based similarity search over peer-to-peer systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Semantic overlay networks for p2p systems

AP2PC'04 Proceedings of the Third international conference on Agents and Peer-to-Peer Computing
DESENT: decentralized and distributed semantic overlay generation in P2P networks

IEEE Journal on Selected Areas in Communications

Workshop on large-scale distributed systems for information retrieval

ACM SIGIR Forum
Scalability of findability: effective and efficient IR operations in large information networks

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Cluster-K+: Network topology for searching replicated data in p2p systems

Information Processing and Management: an International Journal
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with content management in cases of highly distributed data collections. We propose a self-organizing P2P approach in which an unstructured P2P network evolves into a super-peer architecture, with super-peers responsible for peers with similar content. Our approach is based on distributed clustering of peer contents, thus managing to create high quality clusters that span the entire network. More importantly, we show how to efficiently process similarity queries capitalizing on the newly constructed, clustered super-peer network. During query processing, the query is propagated only to few carefully selected super-peers that are able to return results of high quality. We evaluate the performance of our approach and demonstrate its advantages through simulation experiments on two document collections.