Surviving a search engine overload

Authors:
Aaron Koehl;Haining Wang
Affiliations:
College of William and Mary, Williamsburg, VA, USA;College of William and Mary, Williamsburg, VA, USA
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 14
Cited 1

Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Enabling dynamic content caching for database-driven web sites

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Analysis of web caching architectures: hierarchical and distributed caching

IEEE/ACM Transactions on Networking (TON)
A survey of web caching schemes for the Internet

ACM SIGCOMM Computer Communication Review
Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites

Proceedings of the 11th international conference on World Wide Web
Crawler Detection: A Bayesian Approach

ICISP '06 Proceedings of the International Conference on Internet Surveillance and Protection
Form-based proxy caching for database-backed web sites: keywords and functions

The VLDB Journal — The International Journal on Very Large Data Bases
Web robot detection: A probabilistic reasoning approach

Computer Networks: The International Journal of Computer and Telecommunications Networking
Caching dynamic web content: designing and analysing an aspect-oriented solution

Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
An investigation of web crawler behavior: characterization and metrics

Computer Communications
Active cache: caching dynamic contents on the Web

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Selection Policy of Rescue Servers Based on Workload Characterization of Flash Crowd

SKG '10 Proceedings of the 2010 Sixth International Conference on Semantics, Knowledge and Grids
Caching personalised and database-related dynamic web pages

International Journal of High Performance Computing and Networking
A Load Reduction System to Mitigate Flash Crowds on Web Server

ISADS '11 Proceedings of the 2011 Tenth International Symposium on Autonomous Decentralized Systems

An extensive study of Web robots traffic

Proceedings of International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines are an essential component of the web, but their web crawling agents can impose a significant burden on heavily loaded web servers. Unfortunately, blocking or deferring web crawler requests is not a viable solution due to economic consequences. We conduct a quantitative measurement study on the impact and cost of web crawling agents, seeking optimization points for this class of request. Based on our measurements, we present a practical caching approach for mitigating search engine overload, and implement the two-level cache scheme on a very busy web server. Our experimental results show that the proposed caching framework can effectively reduce the impact of search engine overload on service quality.