On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs

  • Authors:
  • Dimitris Achlioptas;Aaron Clauset;David Kempe;Cristopher Moore

  • Affiliations:
  • University of California, Santa Cruz, CA;University of New Mexico, Albuquerque, and the Santa Fe Institute, New Mexico;University of Southern California, Los Angeles, CA;University of New Mexico, Albuquerque, and the Santa Fe Institute, New Mexico

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Understanding the graph structure of the Internet is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining this graph structure can be a surprisingly difficult task, as edges cannot be explicitly queried. For instance, empirical studies of the network of Internet Protocol (IP) addresses typically rely on indirect methods like traceroute to build what are approximately single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a paper by Lakhina et al. [2003] found empirically that the resulting sample is intrinsically biased. Further, in simulations, they observed that the degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson. In this article, we study the bias of traceroute sampling mathematically and, for a very general class of underlying degree distributions, explicitly calculate the distribution that will be observed. As example applications of our machinery, we prove that traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.