Probabilistic file indexing and searching in unstructured peer-to-peer networks

  • Authors:
  • An-Hsun Cheng;Yuh-Jzer Joung

  • Affiliations:
  • Department of Information Management, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan, ROC;Department of Information Management, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan, ROC

  • Venue:
  • Computer Networks: The International Journal of Computer and Telecommunications Networking
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Thanks to the advance of network and computing technology, Peer-to-Peer (P2P) has become a popular way for file sharing. A huge amount of files can now be directly accessed and downloaded by a simple mouse click. Among the types of P2P networks, unstructured architecture has been proven quite successful, mainly due to its simplicity and robustness. However, searching for distant and rare files is still a challenging problem in unstructured P2P networks. Existing approaches either have poor response time, or generate too much network traffic. In this paper we propose a simple, practical, yet powerful index scheme to enhance search in unstructured P2P networks. The index scheme uses a data structure ''Bloom filters'' to index files shared at each node, and then lets nodes gossip to one another to exchange their Bloom filters. In effect, each node indexes a random set of files in the network, thereby allowing every query to have a constant probability to be successfully resolved within a fixed search space. The experimental results show that our approach can improve the search in Gnutella by an order of magnitude. For example, in a typical Gnutella network consisting of about 89,000 nodes, by replicating a node's Bloom filter to less than 0.45% of the nodes in the network, 70% of the queries can be resolved within a search space of 200 nodes. In contrast, within the same search space size, only 1.6% of the queries can be resolved without the index scheme; or, alternatively, more than 48,000 nodes need to be searched in Gnutella in order to reach the same success rate as our index scheme.