On the peninsula phenomenon in web graph and its implications on web search

Authors:
Tao Meng;Hong-Fei Yan
Affiliations:
Lab of Computer Networks and Distributed System, Department of Computer Science and Technology, Peking University, Beijing, China;Lab of Computer Networks and Distributed System, Department of Computer Science and Technology, Peking University, Beijing, China
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2007

Citing 9
Cited 3

Random number generators: good ones are hard to find

Communications of the ACM
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
Architectural design and evaluation of an efficient web-crawling system

Journal of Systems and Software
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
The web as a graph: measurements, models, and methods

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics

Automatic keyword prediction using Google similarity distance

Expert Systems with Applications: An International Journal
Word AdHoc Network: Using Google Core Distance to extract the most relevant information

Knowledge-Based Systems
Using Google latent semantic distance to extract the most relevant information

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the destination pages, which is similar to the scenario of reaching an inner town of a peninsula through other towns at the edge of the peninsula. In this paper, we try to validate that peninsulas are a universal phenomenon in the World-Wide Web, and clarify how this phenomenon can be used to enhance web search and study web connectivity problems. For this purpose, we model the web as a directed graph, and give a proper definition of peninsulas based on this graph. We also present an efficient algorithm to find web peninsulas. Using data collected from the Chinese web by Tianwang search engine, we perform an experiment on the distribution of sizes of peninsulas and their correlations with PageRank values, outdegrees, or indegrees of the ties with other outside vertices. The results show that the peninsula structure on a web graph can greatly expedite the computation of PageRank values; and it can also significantly affect the link extraction capability and information coverage of web crawlers.