Dissemination of collection wide information in a distributed information retrieval system
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
On the update of term weights in dynamic information retrieval systems
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
Machine Learning
Scalable Fault-Tolerant Aggregation in Large Process Groups
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
ACM Transactions on Computer Systems (TOCS)
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Full-text federated search of text-based digital libraries in peer-to-peer networks
Information Retrieval
Adaptive query-based sampling for distributed IR
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid global-local indexing for effcient peer-to-peer information retrieval
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Global term weights in distributed environments
Information Processing and Management: an International Journal
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
Information Retrieval and Filtering over Self-organising Digital Libraries
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Approximate Information Filtering in Peer-to-Peer Networks
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
PHIRST: A distributed architecture for P2P information retrieval
Information Systems
MINERVA∞: a scalable efficient peer-to-peer search engine
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Aggregation of Document Frequencies in Unstructured P2P Networks
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
An extended document frequency metric for feature selection in text categorization
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
On the usage of global document occurrences in peer-to-peer information systems
OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
A peer-to-peer architecture for information retrieval across digital library collections
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Scalable semantic overlay generation for p2p-based digital libraries
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Adaptive query-based sampling of distributed collections
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
DL meets p2p – distributed document retrieval based on classification and content
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Willow: DHT, aggregation, and publish/subscribe in one protocol
IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems
Comparing different architectures for query routing in peer-to-peer networks
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
DESENT: decentralized and distributed semantic overlay generation in P2P networks
IEEE Journal on Selected Areas in Communications
Evaluation of feature combination approaches for text categorisation
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Scalable search and retrieval over numerous web document collections distributed across different sites can be achieved by adopting a peer-to-peer (P2P) communication model. Terms and their document frequencies are the main components of text information retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a challenging problem in the context of unstructured P2P networks, since the local document collections may not reflect the global collection in an accurate way. This might happen due to skews in the distribution of documents to peers. Moreover, central assembly of the total information is not a scalable solution due to the excessive cost of storage and maintenance, and because of issues related to digital rights management. In this paper, we present an efficient hybrid approach for aggregation of document frequencies using a hierarchical overlay network for a carefully selected set of the most important terms, together with gossip-based aggregation for the remaining terms in the collections. Furthermore, we present a cost analysis to compute the communication cost of hybrid aggregation. We conduct experiments on three document collections, in order to evaluate the quality of the proposed hybrid aggregation.