Lower bounds for sampling algorithms for estimating the average
Information Processing Letters
Tiny families of functions with random properties: a quality-size trade-off for hashing
Proceedings of the workshop on Randomized algorithms and computation
Sampling algorithms: lower bounds and applications
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 18th ACM conference on Information and knowledge management
Walking in facebook: a case study of unbiased sampling of OSNs
INFOCOM'10 Proceedings of the 29th conference on Information communications
Estimating and sampling graphs with multidimensional random walks
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating sizes of social networks via biased sampling
Proceedings of the 20th international conference on World wide web
Proceedings of the 20th international conference on World wide web
A sample of samplers: a computational perspective on sampling
Studies in complexity and cryptography
Framework and algorithms for network bucket testing
Proceedings of the 21st international conference on World Wide Web
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
On estimating the average degree
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
We investigate a class of methods that we call "social sampling," where participants in a poll respond with a summary of their friends' putative responses to the poll. Social sampling leads to a novel trade-off question: the savings in the number of samples(roughly the average degree of the network of participants) vs. the systematic bias in the poll due to the network structure. We provide precise analyses of estimators that result from this idea. With non-uniform sampling of nodes and non-uniform weighting of neighbors' responses, we devise an ideal unbiased estimator. We show that the variance of this estimator is controlled by the second eigenvalue of the normalized Laplacian of the network (the network structure penalty) and the correlation between node degrees and the property being measured (the effective savings factor). In addition, we present a sequence of approximate estimators that are simpler or more realistic or both, and analyze their performance. Experiments on large real-world networks show that social sampling is a powerful paradigm in obtaining accurate estimates with very few samples. At the same time, our results urge caution in interpreting recent results about "expectation vs. intent polling".