Random number generators: good ones are hard to find
Communications of the ACM
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Architectural design and evaluation of an efficient web-crawling system
Journal of Systems and Software
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
The web as a graph: measurements, models, and methods
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Automatic keyword prediction using Google similarity distance
Expert Systems with Applications: An International Journal
Word AdHoc Network: Using Google Core Distance to extract the most relevant information
Knowledge-Based Systems
Using Google latent semantic distance to extract the most relevant information
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Web masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the destination pages, which is similar to the scenario of reaching an inner town of a peninsula through other towns at the edge of the peninsula. In this paper, we try to validate that peninsulas are a universal phenomenon in the World-Wide Web, and clarify how this phenomenon can be used to enhance web search and study web connectivity problems. For this purpose, we model the web as a directed graph, and give a proper definition of peninsulas based on this graph. We also present an efficient algorithm to find web peninsulas. Using data collected from the Chinese web by Tianwang search engine, we perform an experiment on the distribution of sizes of peninsulas and their correlations with PageRank values, outdegrees, or indegrees of the ties with other outside vertices. The results show that the peninsula structure on a web graph can greatly expedite the computation of PageRank values; and it can also significantly affect the link extraction capability and information coverage of web crawlers.