Crawler-Friendly Web Servers

Authors:
Onn Brandman;Junghoo Cho;Hector Garcia-Molina;Narayanan Shivakumar
Affiliations:
Dept. of Computer Science, Stanford, CA;Dept. of Computer Science, Stanford, CA;Dept. of Computer Science, Stanford, CA;Dept. of Computer Science, Stanford, CA
Venue:
ACM SIGMETRICS Performance Evaluation Review
Year:
2000

Citing 0
Cited 10

Looking at both the present and the past to efficiently update replicas of web content

Proceedings of the 7th annual ACM international workshop on Web information and data management
Search Engine Coverage of the OAI-PMH Corpus

IEEE Internet Computing
Stanford WebBase components and applications

ACM Transactions on Internet Technology (TOIT)
Efficient, automatic web resource harvesting

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Designing clustering-based web crawling policies for search engine crawlers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
iRobot: an intelligent crawler for web forums

Proceedings of the 17th international conference on World Wide Web
Sitemaps: above and beyond the crawl of duty

Proceedings of the 18th international conference on World wide web
Optimising context data dissemination and storage in distributed pervasive computing systems

Pervasive and Mobile Computing
Efficiently detecting webpage updates using samples

ICWE'07 Proceedings of the 7th international conference on Web engineering
A co-operative web services paradigm for supporting crawlers

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant bandwidth savings. Specifically, we propose that web servers export meta-data archives decribing their content.