Aggregation of Document Frequencies in Unstructured P2P Networks

Authors:
Robert Neumayer;Christos Doulkeridis;Kjetil Nørvåg
Affiliations:
Norwegian University of Science and Technology, Trondheim, Norway 7491;Norwegian University of Science and Technology, Trondheim, Norway 7491;Norwegian University of Science and Technology, Trondheim, Norway 7491
Venue:
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Year:
2009

Citing 21
Cited 1

Dissemination of collection wide information in a distributed information retrieval system

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
On the update of term weights in dynamic information retrieval systems

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Full-text federated search of text-based digital libraries in peer-to-peer networks

Information Retrieval
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Global term weights in distributed environments

Information Processing and Management: an International Journal
Query-driven indexing for scalable peer-to-peer text retrieval

Proceedings of the 2nd international conference on Scalable information systems
Information Retrieval and Filtering over Self-organising Digital Libraries

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Approximate Information Filtering in Peer-to-Peer Networks

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
MINERVA∞: a scalable efficient peer-to-peer search engine

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
An extended document frequency metric for feature selection in text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
On the usage of global document occurrences in peer-to-peer information systems

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
A peer-to-peer architecture for information retrieval across digital library collections

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Scalable semantic overlay generation for p2p-based digital libraries

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
DL meets p2p – distributed document retrieval based on classification and content

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Content-based similarity search over peer-to-peer systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Comparing different architectures for query routing in peer-to-peer networks

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
DESENT: decentralized and distributed semantic overlay generation in P2P networks

IEEE Journal on Selected Areas in Communications

A hybrid approach for estimating document frequencies in unstructured P2P networks

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task, as the local view of each peer may not reflect the global document collection, due to skewed document distributions. Moreover, central assembly of the total information is not feasible, due to the prohibitive cost of storage and maintenance, and also because of issues related to digital rights management. In this paper, we propose an efficient approach for aggregating the document frequencies of carefully selected terms based on a hierarchical overlay network. To this end, we examine unsupervised feature selection techniques at the individual peer level, in order to identify only a limited set of the most important terms for aggregation. We provide a theoretical analysis to compute the cost of our approach, and we conduct experiments on two document collections, in order to measure the quality of the aggregated document frequencies.