Adaptive indexing for content-based search in P2P systems

Authors:
Aoying Zhou;Rong Zhang;Weining Qian;Quang Hieu Vu;Tianming Hu
Affiliations:
Software Engineering Institute, East China Normal University, China and Department of Computer Science and Engineering, Fudan University, China;Software Engineering Institute, East China Normal University, China and Department of Computer Science and Engineering, Fudan University, China;Software Engineering Institute, East China Normal University, China;Singapore MIT Alliance, National University of Singapore, Singapore;Software Engineering Institute, East China Normal University, China
Venue:
Data & Knowledge Engineering
Year:
2008

Citing 25
Cited 2

On modeling of information retrieval concepts in vector spaces

ACM Transactions on Database Systems (TODS)
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
A vector space model for automatic indexing

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Content-based retrieval in hybrid peer-to-peer networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
PlanetLab: an overlay testbed for broad-coverage services

ACM SIGCOMM Computer Communication Review
Efficient Semantic-Based Content Search in P2P Network

IEEE Transactions on Knowledge and Data Engineering
Simple efficient load balancing algorithms for peer-to-peer systems

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Querying peer-to-peer networks using P-trees

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
The robustness of content-based search in hierarchical peer to peer networks

Proceedings of the thirteenth ACM international conference on Information and knowledge management
PRISM: indexing multi-dimensional data in P2P networks using reference vectors

Proceedings of the 13th annual ACM international conference on Multimedia
HiWaRPP ― Hierarchical Wavelet-based Retrieval on Peer-to-Peer Network

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SIPPER: Selecting Informative Peers in Structured P2P Environment for Content-Based Retrieval

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ALVIS peers: a scalable full-text peer-to-peer retrieval engine

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Efficient peer-to-peer keyword searching

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
COSTA: Adaptive Indexing for Terms in a Large-scale Distributed System

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
OverCite: a cooperative digital research library

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
LibraRing: an architecture for distributed digital libraries based on DHTs

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Content-based similarity search over peer-to-peer systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Federated search of text-based digital libraries in hierarchical peer-to-peer networks

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Editorial: An efficient index buffer management scheme for implementing a B-tree on NAND flash memory

Data & Knowledge Engineering
Transaction processing in a peer to peer database network

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the major challenges in Peer-to-Peer (P2P) file sharing systems is to support content-based search. Although there have been some proposals to address this challenge, they share the same weakness of using either servers or super-peers to keep global knowledge, which is required to identify importance of terms to avoid popular terms in query processing. As a result, they are not scalable and are prone to the bottleneck problem, which is caused by the high visiting load at the global knowledge maintainers. To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge. Our method is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. The tree is used to aggregate and classify terms adaptively, while the Chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to higher level nodes. Such classification enables the system to process queries on the fly without the need for global knowledge. Besides, compared to the methods that index terms separately, term aggregation reduces the indexing cost significantly. Taking advantage of the tree structure, we also develop an efficient search algorithm to tackle the bottleneck problem near the root. Finally, our extensive experiments on both benchmark and Wikipedia datasets validated the effectiveness and efficiency of the proposed method.