Looking at both the present and the past to efficiently update replicas of web content
Proceedings of the 7th annual ACM international workshop on Web information and data management
Search Engine Coverage of the OAI-PMH Corpus
IEEE Internet Computing
Stanford WebBase components and applications
ACM Transactions on Internet Technology (TOIT)
Efficient, automatic web resource harvesting
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Designing clustering-based web crawling policies for search engine crawlers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
Sitemaps: above and beyond the crawl of duty
Proceedings of the 18th international conference on World wide web
Optimising context data dissemination and storage in distributed pervasive computing systems
Pervasive and Mobile Computing
Efficiently detecting webpage updates using samples
ICWE'07 Proceedings of the 7th international conference on Web engineering
A co-operative web services paradigm for supporting crawlers
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant bandwidth savings. Specifically, we propose that web servers export meta-data archives decribing their content.