Uniform Data Sampling from a Peer-to-Peer Network

Authors:
Souptik Datta;Hillol Kargupta
Affiliations:
University of Maryland;University of Maryland
Venue:
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Year:
2007

Citing 0
Cited 12

An effective algorithm for mining 3-clusters in vertically partitioned data

Proceedings of the 17th ACM conference on Information and knowledge management
Maintaining replicas in unstructured P2P systems

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Distribution fairness in Internet-scale networks

ACM Transactions on Internet Technology (TOIT)
Uniform Sampling for Directed P2P Networks

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Distributed online aggregations

Proceedings of the VLDB Endowment
Raptor packets: a packet-centric approach to distributed raptor code design

ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 4
Time-based sampling of social network activity graphs

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Scalable Uniform Graph Sampling by Local Computation

SIAM Journal on Scientific Computing
Ubiquitous technologies

Ubiquitous knowledge discovery
Ubiquitous technologies

Ubiquitous knowledge discovery
Randomized load balancing by joining and splitting bins

Information Processing Letters
Packet-centric approach to distributed sparse-graph coding in wireless ad hoc networks

Ad Hoc Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Uniform random sample is often useful in analyzing data. Usually taking a uniform sample is not a problem if the entire data resides in one location. However, if the data is distributed in a peer-to-peer (P2P) network with different amount of data in different peers, collecting a uniform sample of data becomes a challenging task. A random sampling can be performed using random-walk, but due to varying degrees of connectivity and different sizes of data owned by each peer, this random walk gives a biased sample. In this paper, we propose a random walk-based sampling algorithm that can be used to sample data tuples uniformly from a large, unstructured P2P network. We model the random walk as aMarkov chain and derive conditions to bound the length of the random walk necessary to achieve uniformity. A formal communication analysis shows logarithmic communication cost to discover a uniform data sample.