The Web as a graph: How far we are

  • Authors:
  • Debora Donato;Luigi Laura;Stefano Leonardi;Stefano Millozzi

  • Affiliations:
  • University of Rome, Roma, Italy;University of Rome, Roma, Italy;University of Rome, Roma, Italy;University of Rome, Roma, Italy

  • Venue:
  • ACM Transactions on Internet Technology (TOIT)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article we present an experimental study of the properties of webgraphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford, as well as several synthetic ones generated according to various models proposed recently. We investigate several topological properties of such graphs, including the number of bipartite cores and strongly connected components, the distribution of degrees and PageRank values and some correlations; we present a comparison study of the models against these measures.Our findings are that (i) the WebBase sample differs slightly from the (older) samples studied in the literature, and (ii) despite the fact that these models do not catch all of its properties, they do exhibit some peculiar behaviors not found, for example, in the models from classical random graph theory.Moreover we developed a software library able to generate and measure massive graphs in secondary memory; this library is publicy available under the GPL licence. We discuss its implementation and some computational issues related to secondary memory graph algorithms.