A comparison of web robot and human requests

Authors:
Derek Doran;Kevin Morillo;Swapna S. Gokhale
Affiliations:
Univ. of Connecticut, Storrs, CT;Univ. of Connecticut, Storrs, CT;Univ. of Connecticut, Storrs, CT
Venue:
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Year:
2013

Citing 24
Cited 0

Characterizing browsing strategies in the World-Wide Web

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Web server workload characterization: the search for invariants

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Web usage mining for Web site evaluation

Communications of the ACM
Probability and statistics with reliability, queuing and computer science applications

Probability and statistics with reliability, queuing and computer science applications
Mercator: A scalable, extensible Web crawler

World Wide Web
Performance Characteristics of the World Wide Web

Performance Evaluation: Origins and Directions
Popularity-Aware Greedy Dual-Size Web Proxy Caching Algorithms

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Redirection Algorithms for Load Sharing in Distributed Web-server Systems

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Identifying Interesting Customers through Web Log Classification

IEEE Intelligent Systems
Mining user access patterns with traversal constraint for predicting web page requests

Knowledge and Information Systems
Investigating behavioral variability in web search

Proceedings of the 16th international conference on World Wide Web
Web robot detection in the scholarly information environment

Journal of Information Science
Discovering New Trends in Web Robot Traffic Through Functional Classification

NCA '08 Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications
Queueing Theory: A Linear Algebraic Approach

Queueing Theory: A Linear Algebraic Approach
An investigation of web crawler behavior: characterization and metrics

Computer Communications
The anatomy of a large-scale social search engine

Proceedings of the 19th international conference on World wide web
Searching for Heavy Tails in Web Robot Traffic

QEST '10 Proceedings of the 2010 Seventh International Conference on the Quantitative Evaluation of Systems
The Ethicality of Web Crawlers

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Web robot detection techniques: overview and limitations

Data Mining and Knowledge Discovery
Analysis of web logs: challenges and findings

PERFORM'10 Proceedings of the 2010 IFIP WG 6.3/7.3 international conference on Performance Evaluation of Computer and Communication Systems: milestones and future challenges
PUBCRAWL: protecting users and businesses from CRAWLers

Security'12 Proceedings of the 21st USENIX conference on Security symposium
A classification framework for web robots

Journal of the American Society for Information Science and Technology
Human sensing for smart cities

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sophisticated Web robots sport a wide variety of functionality and visiting characteristics, constituting a significant percentage of the requests serviced by a Web server. Unlike human clients that retrieve information off a site by navigating links and ignoring irrelevant information, Web robots may collect many different types of resources, and employ varying navigation strategies to find the knowledge on the site they desire. Thus, the resource request patterns of their visits are unpredictable and cannot be inferred based on our knowledge of human request patterns. In this paper, we perform an analysis on the types of resources requested by Web robots using recent Web logs from an academic Web server. We study the distribution of response sizes and response codes, the types of resources requested, and popularity of resources for requests from Web robots. Throughout, we contrast our findings against human resource request patterns. We find reasons to suggest that robots severely handicaps the ability of Web server caches to operate with high performance.