On the peninsula phenomenon in web graph and its implications on web search

  • Authors:
  • Tao Meng;Hong-Fei Yan

  • Affiliations:
  • Lab of Computer Networks and Distributed System, Department of Computer Science and Technology, Peking University, Beijing, China;Lab of Computer Networks and Distributed System, Department of Computer Science and Technology, Peking University, Beijing, China

  • Venue:
  • Computer Networks: The International Journal of Computer and Telecommunications Networking
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the destination pages, which is similar to the scenario of reaching an inner town of a peninsula through other towns at the edge of the peninsula. In this paper, we try to validate that peninsulas are a universal phenomenon in the World-Wide Web, and clarify how this phenomenon can be used to enhance web search and study web connectivity problems. For this purpose, we model the web as a directed graph, and give a proper definition of peninsulas based on this graph. We also present an efficient algorithm to find web peninsulas. Using data collected from the Chinese web by Tianwang search engine, we perform an experiment on the distribution of sizes of peninsulas and their correlations with PageRank values, outdegrees, or indegrees of the ties with other outside vertices. The results show that the peninsula structure on a web graph can greatly expedite the computation of PageRank values; and it can also significantly affect the link extraction capability and information coverage of web crawlers.