Effective web crawling

Authors:
Carlos Castillo
Affiliations:
University of Chile
Venue:
ACM SIGIR Forum
Year:
2005

Citing 4
Cited 7

Web Structure, Dynamics and Page Quality

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Cooperation Schemes between a Web Server and a Web Search Engine

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Scheduling Algorithms for Web Crawling

LA-WEBMEDIA '04 Proceedings of the WebMedia & LA-Web 2004 Joint Conference 10th Brazilian Symposium on Multimedia and the Web 2nd Latin American Web Congress
On the image content of a web segment: Chile as a case study

Journal of Web Engineering

Effect of word density on measuring words association

COMPUTE '08 Proceedings of the 1st Bangalore Annual Compute Conference
Efficiently detecting webpage updates using samples

ICWE'07 Proceedings of the 7th international conference on Web engineering
Web site traffic ranking estimation via SVM

ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Fixing the threshold for effective detection of near duplicate web documents in web crawling

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
A constrained crawling approach and its application to a specialised search engine

International Journal of Information and Communication Technology
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems
GAT: Platform for automatic context-aware mobile services for m-tourism

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. Both issues are also the most important source of problems for locating information. The Web is a context in which traditional Information Retrieval methods are challenged, and given the volume of the Web and its speed of change, the coverage of modern search engines is relatively small. Moreover, the distribution of quality is very skewed, and interesting pages are scarce in comparison with the rest of the content.