Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of topological characteristics of huge online social networking services
Proceedings of the 16th international conference on World Wide Web
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Metropolis Algorithms for Representative Subgraph Sampling
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
On the evolution of user interaction in Facebook
Proceedings of the 2nd ACM workshop on Online social networks
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Crawling Facebook for social network analysis purposes
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Counting YouTube videos via random prefix sampling
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Bias Correction in a Small Sample from Big Data
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
The properties of online social networks (OSNs) are of great interests to the general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden properties of the underlying data. While uniform random sampling is often preferred, some properties such as the top bloggers need to be obtained using PPS (probability proportional to size) sampling. Although PPS sampling can be approximated using simple random walk, it is not efficient because only one sample is taken in every step. This paper introduces an efficient sampling method, called star sampling, that takes all the neighbours as valid samples. It is more efficient than random walk sampling by a factor of the average degrees. We derive the estimator and its variance, and verify the result using six large real-networks locally where the ground-truth are known and the estimations can be evaluated. Then we apply our method on Weibo, the Chinese version of Twitter, whose properties are rarely studied albeit its enormous size and influence. Along with other conventional metrics such as size and degree distributions, we demonstrate that star sampling can identify ten thousand top bloggers efficiently. In general, the estimated follower number is consistent with the claimed number, but there are cases where the follower numbers are inflated by a factor up to 132.