Albatross sampling: robust and effective hybrid vertex sampling for social graphs

  • Authors:
  • Long Jin;Yang Chen;Pan Hui;Cong Ding;Tianyi Wang;Athanasios V. Vasilakos;Beixing Deng;Xing Li

  • Affiliations:
  • Tsinghua University, Beijing, China;University of Goettingen, Goettingen, Germany;Deutsche Telekom Laboratories/TU-Berlin, Berlin, Germany;University of Goettingen, Goettingen, Germany;Tsinghua University, Beijing, China;University of Western Macedonia, Kozani, Greece;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • HotPlanet '11 Proceedings of the 3rd ACM international workshop on MobiArch
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing or even the data crawling incredibly time consuming, and sometimes impossible to be completed. Thus, graph sampling algorithms have been introduced to obtain a smaller subgraph which reflects the properties of the original graph well. Breadth-First Sampling (BFS) is widely used in graph sampling, but it is biased towards high-degree vertices during the process of sampling. Besides, Metropolis-Hasting Random Walk (MHRW), which is proposed to get unbiased samples of the social graph, requires the graph to be well connected. In this paper, we propose a vertex sampling algorithm, so-called Albatross Sampling (AS), which introduces random jump strategy into MHRW during the sampling process. The embedded random jump makes the sampling procedure more flexible and avoids being trapped in some locally well connected part. According to our evaluation, we find that no matter using tightly or loosely connected graphs, AS performs significantly better than MHRW and BFS. On the one hand, AS estimates the degree distribution with much lower Normalized Mean Square Error (NMSE) by consuming the same resource budget. On the other hand, to get an acceptable estimation of the degree distribution, AS requires much less resource budget.