Uniform Data Sampling from a Peer-to-Peer Network

  • Authors:
  • Souptik Datta;Hillol Kargupta

  • Affiliations:
  • University of Maryland;University of Maryland

  • Venue:
  • ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Uniform random sample is often useful in analyzing data. Usually taking a uniform sample is not a problem if the entire data resides in one location. However, if the data is distributed in a peer-to-peer (P2P) network with different amount of data in different peers, collecting a uniform sample of data becomes a challenging task. A random sampling can be performed using random-walk, but due to varying degrees of connectivity and different sizes of data owned by each peer, this random walk gives a biased sample. In this paper, we propose a random walk-based sampling algorithm that can be used to sample data tuples uniformly from a large, unstructured P2P network. We model the random walk as aMarkov chain and derive conditions to bound the length of the random walk necessary to achieve uniformity. A formal communication analysis shows logarithmic communication cost to discover a uniform data sample.