Social sampling

Authors:
Anirban Dasgupta;Ravi Kumar;D. Sivakumar
Affiliations:
Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 11
Cited 2

Lower bounds for sampling algorithms for estimating the average

Information Processing Letters
Tiny families of functions with random properties: a quality-size trade-off for hashing

Proceedings of the workshop on Randomized algorithms and computation
Sampling algorithms: lower bounds and applications

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Sampling from large graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Voting in social networks

Proceedings of the 18th ACM conference on Information and knowledge management
Walking in facebook: a case study of unbiased sampling of OSNs

INFOCOM'10 Proceedings of the 29th conference on Information communications
Estimating and sampling graphs with multidimensional random walks

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating sizes of social networks via biased sampling

Proceedings of the 20th international conference on World wide web
Network bucket testing

Proceedings of the 20th international conference on World wide web
A sample of samplers: a computational perspective on sampling

Studies in complexity and cryptography
Framework and algorithms for network bucket testing

Proceedings of the 21st international conference on World Wide Web

Debiasing social wisdom

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
On estimating the average degree

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate a class of methods that we call "social sampling," where participants in a poll respond with a summary of their friends' putative responses to the poll. Social sampling leads to a novel trade-off question: the savings in the number of samples(roughly the average degree of the network of participants) vs. the systematic bias in the poll due to the network structure. We provide precise analyses of estimators that result from this idea. With non-uniform sampling of nodes and non-uniform weighting of neighbors' responses, we devise an ideal unbiased estimator. We show that the variance of this estimator is controlled by the second eigenvalue of the normalized Laplacian of the network (the network structure penalty) and the correlation between node degrees and the property being measured (the effective savings factor). In addition, we present a sequence of approximate estimators that are simpler or more realistic or both, and analyze their performance. Experiments on large real-world networks show that social sampling is a powerful paradigm in obtaining accurate estimates with very few samples. At the same time, our results urge caution in interpreting recent results about "expectation vs. intent polling".