Peer-to-peer information retrieval using self-organizing semantic overlay networks

  • Authors:
  • Chunqiang Tang;Zhichen Xu;Sandhya Dwarkadas

  • Affiliations:
  • University of Rochester, Rochester, NY;HP Laboratories, Palo Alto, CA;University of Rochester, Rochester, NY

  • Venue:
  • Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Content-based full-text search is a challenging problem in Peer-to-Peer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned.In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSearch distributes document indices through the P2P network based on document semantics generated by Latent Semantic Indexing (LSI). The search cost (in terms of different nodes searched and data transmitted) for a given query is thereby reduced, since the indices of semantically related documents are likely to be co located in the network.We also describe techniques that help distribute the indices more evenly across the nodes, and further reduce the number of nodes accessed using appropriate index distribution as well as using index samples and recently processed queries to guide the search.Experiments show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during the search, whereas the top 15 documents returned by pSearch and LSI have a 91.7% intersection.