Detect inflated follower numbers in OSN using star sampling

  • Authors:
  • Hao Wang;Jianguo Lu

  • Affiliations:
  • University of Windsor, Windsor, Ontario, Canada;University of Windsor, Windsor, Ontario, Canada

  • Venue:
  • Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The properties of online social networks (OSNs) are of great interests to the general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden properties of the underlying data. While uniform random sampling is often preferred, some properties such as the top bloggers need to be obtained using PPS (probability proportional to size) sampling. Although PPS sampling can be approximated using simple random walk, it is not efficient because only one sample is taken in every step. This paper introduces an efficient sampling method, called star sampling, that takes all the neighbours as valid samples. It is more efficient than random walk sampling by a factor of the average degrees. We derive the estimator and its variance, and verify the result using six large real-networks locally where the ground-truth are known and the estimations can be evaluated. Then we apply our method on Weibo, the Chinese version of Twitter, whose properties are rarely studied albeit its enormous size and influence. Along with other conventional metrics such as size and degree distributions, we demonstrate that star sampling can identify ten thousand top bloggers efficiently. In general, the estimated follower number is consistent with the claimed number, but there are cases where the follower numbers are inflated by a factor up to 132.