Aggregation of Document Frequencies in Unstructured P2P Networks

  • Authors:
  • Robert Neumayer;Christos Doulkeridis;Kjetil Nørvåg

  • Affiliations:
  • Norwegian University of Science and Technology, Trondheim, Norway 7491;Norwegian University of Science and Technology, Trondheim, Norway 7491;Norwegian University of Science and Technology, Trondheim, Norway 7491

  • Venue:
  • WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task, as the local view of each peer may not reflect the global document collection, due to skewed document distributions. Moreover, central assembly of the total information is not feasible, due to the prohibitive cost of storage and maintenance, and also because of issues related to digital rights management. In this paper, we propose an efficient approach for aggregating the document frequencies of carefully selected terms based on a hierarchical overlay network. To this end, we examine unsupervised feature selection techniques at the individual peer level, in order to identify only a limited set of the most important terms for aggregation. We provide a theoretical analysis to compute the cost of our approach, and we conduct experiments on two document collections, in order to measure the quality of the aggregated document frequencies.