Dissemination of collection wide information in a distributed information retrieval system
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
On the update of term weights in dynamic information retrieval systems
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Full-text federated search of text-based digital libraries in peer-to-peer networks
Information Retrieval
Hybrid global-local indexing for effcient peer-to-peer information retrieval
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Global term weights in distributed environments
Information Processing and Management: an International Journal
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
Information Retrieval and Filtering over Self-organising Digital Libraries
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Approximate Information Filtering in Peer-to-Peer Networks
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
MINERVA∞: a scalable efficient peer-to-peer search engine
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
An extended document frequency metric for feature selection in text categorization
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
On the usage of global document occurrences in peer-to-peer information systems
OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
A peer-to-peer architecture for information retrieval across digital library collections
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Scalable semantic overlay generation for p2p-based digital libraries
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
DL meets p2p – distributed document retrieval based on classification and content
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Content-based similarity search over peer-to-peer systems
DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Comparing different architectures for query routing in peer-to-peer networks
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
DESENT: decentralized and distributed semantic overlay generation in P2P networks
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task, as the local view of each peer may not reflect the global document collection, due to skewed document distributions. Moreover, central assembly of the total information is not feasible, due to the prohibitive cost of storage and maintenance, and also because of issues related to digital rights management. In this paper, we propose an efficient approach for aggregating the document frequencies of carefully selected terms based on a hierarchical overlay network. To this end, we examine unsupervised feature selection techniques at the individual peer level, in order to identify only a limited set of the most important terms for aggregation. We provide a theoretical analysis to compute the cost of our approach, and we conduct experiments on two document collections, in order to measure the quality of the aggregated document frequencies.