Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Random sampling from a search engine's index
Journal of the ACM (JACM)
Efficient estimation of the size of text deep web data source
Proceedings of the 17th ACM conference on Information and knowledge management
Social and Economic Networks
Estimating deep web data source size by capture---recapture method
Information Retrieval
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Networks: An Introduction
Ranking bias in deep web size estimation using capture recapture method
Data & Knowledge Engineering
Estimating sizes of social networks via biased sampling
Proceedings of the 20th international conference on World wide web
Understanding Graph Sampling Algorithms for Social Network Analysis
ICDCSW '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops
Hi-index | 0.00 |
This paper proposes to use simple random walk, a sampling method supported by most online social networks (OSN), to estimate a variety of properties of large OSNs. We show that due to the scale-free nature of OSNs the estimators derived from random walk sampling scheme are much better than uniform random sampling, even when uniform random samples are available disregarding the notorious high cost of obtaining the random samples. The paper first proposes to use harmonic mean to estimate the average degree of OSNs. The accurate estimation of the average degree leads to the discovery of other properties, such as the population size, the heterogeneity of the degrees, the number of friends of friends, the threshold value for messages to reach a large component, and Gini coefficient of the population. The method is validated in complete Twitter data dated in 2009 that contains 42 million nodes and 1.5 billion edges.