Level-Biased statistics in the hierarchical structure of the web

  • Authors:
  • Guang Feng;Tie-Yan Liu;Xu-Dong Zhang;Wei-Ying Ma

  • Affiliations:
  • Microsoft Research Asia, Beijing, P.R. China;MSPLAB, Department of Electronic Engineering, Tsinghua University, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;MSPLAB, Department of Electronic Engineering, Tsinghua University, Beijing, P.R. China

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the literature of web search and mining, researchers used to consider the World Wide Web as a flat network, in which each page as well as each hyperlink is treated identically. However, it is the common knowledge that the Web is organized with a natural hierarchical structure according to the URLs of pages. Exploring the hierarchical structure, we found several level-biased characteristics of the Web. First, the distribution of pages over levels has a spindle shape. Second, the average indegree in each level decreases sharply when the level goes down. Third, although the indegree distributions in deeper levels obey the same power law with the global indegree distribution, the top levels show a quite different statistical characteristic. We believe that these new discoveries might be essential to the Web, and by taking use of them, the current web search and mining technologies could be improved and thus better services to the web users could be provided.