A content model for evaluating peer-to-peer searching techniques

  • Authors:
  • Brian F. Cooper

  • Affiliations:
  • Georgia Institute of Technology

  • Venue:
  • Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Simulation studies are frequently used to evaluate new peer-to-peer searching techniques as well as existing techniques on new applications. Unless these studies are accurate in their modeling of queries and documents, they may not reflect how search techniques will perform in real networks, leading to incorrect conclusions about which techniques are best. We describe how to model content so that simulations produce accurate results. We present a content model for peer-to-peer networks, which consists of a tripartite graph with edges connecting queries to the documents they match, and documents to the peers they are stored at. Our model also includes a set of statistics describing how often queries match the same documents, and how often similar documents are stored at the same peer. We can construct our tripartite content model by running queries over live data stored at real Internet nodes, and simulation results show that searching techniques do indeed perform differently in simulations using this "real" content model versus a randomly generated model. We then present an algorithm for using real content gathered from a small set of peers (say, 1,000) to generate a synthetic content model for large simulated networks (say, 10,000 nodes or more). Finally, we use a synthetic model generated from World Wide Web documents and queries to compare the performance of several search algorithms that have been reported in the literature.