Programming Perl (2nd ed.)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding information on the World Wide Web: the retrieval effectiveness of search engines
Information Processing and Management: an International Journal
Results and challenges in Web search evaluation
WWW '99 Proceedings of the eighth international conference on World Wide Web
Accessibility of information on the Web
intelligence
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Google's Web Page Ranking applied to different topological Web Graph structures
Journal of the American Society for Information Science
Proceedings of the 11th international conference on World Wide Web
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Hyperlink Analysis for the Web
IEEE Internet Computing
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
The web as a graph: measurements, models, and methods
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Text characteristics of English language university Web sites: Research Articles
Journal of the American Society for Information Science and Technology
The freshness of web search engine databases
Journal of Information Science
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Decoding the structure of the WWW: A comparative analysis of Web crawls
ACM Transactions on the Web (TWEB)
The Viúva Negra crawler: an experience report
Software—Practice & Experience
A three-year study on the freshness of web search engine databases
Journal of Information Science
Gain based evaluation measure for ranked web results
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Hi-index | 0.00 |
In this article, I investigate the reliability, in the social science sense, of collecting informetric data about the World Wide Web by Web crawling. The investigation includes a critical examination of the practice of Web crawling and contrasts the results of content crawling with the results of link crawling. It is shown that Web crawling by search engines is intentionally biased and selective. I also report the results of a large-scale experimental simulation of Web crawling that illustrates the effects of different crawling policies on data collection. It is concluded that the reliability of Web crawling as a data collection technique is improved by fuller reporting of relevant crawling policies.