Adaptive indexing for content-based search in P2P systems

  • Authors:
  • Aoying Zhou;Rong Zhang;Weining Qian;Quang Hieu Vu;Tianming Hu

  • Affiliations:
  • Software Engineering Institute, East China Normal University, China and Department of Computer Science and Engineering, Fudan University, China;Software Engineering Institute, East China Normal University, China and Department of Computer Science and Engineering, Fudan University, China;Software Engineering Institute, East China Normal University, China;Singapore MIT Alliance, National University of Singapore, Singapore;Software Engineering Institute, East China Normal University, China

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the major challenges in Peer-to-Peer (P2P) file sharing systems is to support content-based search. Although there have been some proposals to address this challenge, they share the same weakness of using either servers or super-peers to keep global knowledge, which is required to identify importance of terms to avoid popular terms in query processing. As a result, they are not scalable and are prone to the bottleneck problem, which is caused by the high visiting load at the global knowledge maintainers. To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge. Our method is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. The tree is used to aggregate and classify terms adaptively, while the Chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to higher level nodes. Such classification enables the system to process queries on the fly without the need for global knowledge. Besides, compared to the methods that index terms separately, term aggregation reduces the indexing cost significantly. Taking advantage of the tree structure, we also develop an efficient search algorithm to tackle the bottleneck problem near the root. Finally, our extensive experiments on both benchmark and Wikipedia datasets validated the effectiveness and efficiency of the proposed method.