An investigation of documents from the World Wide Web
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
Proceedings of the 11th international conference on World Wide Web
What do web users do? An empirical analysis of web use
International Journal of Human-Computer Studies
Architectural design and evaluation of an efficient web-crawling system
Journal of Systems and Software
Mercator: A scalable, extensible Web crawler
World Wide Web
Proceedings of the 27th International Conference on Very Large Data Bases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient URL caching for world wide web crawling
WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines
ACM SIGIR Forum
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
On the Image Content of the Chilean Web
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Impact of search engines on page popularity
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 13th international conference on World Wide Web
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Language identification in web pages
Proceedings of the 2005 ACM symposium on Applied computing
Characterizing a national community web
ACM Transactions on Internet Technology (TOIT)
The WebCAT Framework " Automatic Generation of Meta-Data for Web Resources
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Managing duplicates in a web archive
Proceedings of the 2006 ACM symposium on Applied computing
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Design and selection criteria for a national web archive
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Evaluating web user perceived latency using server side measurements
Computer Communications
Hi-index | 0.00 |
This paper documents hazardous situations on the Web that crawlers must address. This knowledge was accumulated while developing and operating the Viúva Negra (VN) crawler to feed a search engine and a Web archive for the Portuguese Web for four years. The design, implementation and evaluation of the VN crawler are also presented as a case study of a Web crawler design. The case study tested provides crawling techniques that may be useful for the further development of crawlers. Copyright © 2007 John Wiley & Sons, Ltd.