Estimating and sampling graphs with multidimensional random walks

  • Authors:
  • Bruno Ribeiro;Don Towsley

  • Affiliations:
  • University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA

  • Venue:
  • IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too many resources (time, bandwidth, or money). Random walks, which normally require fewer resources per sample, can suffer from large estimation errors in the presence of disconnected or loosely connected graphs. In this work we propose a new m-dimensional random walk that uses m dependent random walkers. We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk. At the same time, our simulations over large real world graphs show that, in the presence of disconnected or loosely connected components, Frontier sampling exhibits lower estimation errors than regular random walks. We also show that Frontier sampling is more suitable than random vertex sampling to sample the tail of the degree distribution of the graph.