Characterizing browsing strategies in the World-Wide Web
Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Web server workload characterization: the search for invariants
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Web usage mining for Web site evaluation
Communications of the ACM
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
Mercator: A scalable, extensible Web crawler
World Wide Web
Performance Characteristics of the World Wide Web
Performance Evaluation: Origins and Directions
Popularity-Aware Greedy Dual-Size Web Proxy Caching Algorithms
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Redirection Algorithms for Load Sharing in Distributed Web-server Systems
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Identifying Interesting Customers through Web Log Classification
IEEE Intelligent Systems
Mining user access patterns with traversal constraint for predicting web page requests
Knowledge and Information Systems
Investigating behavioral variability in web search
Proceedings of the 16th international conference on World Wide Web
Web robot detection in the scholarly information environment
Journal of Information Science
Discovering New Trends in Web Robot Traffic Through Functional Classification
NCA '08 Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications
Queueing Theory: A Linear Algebraic Approach
Queueing Theory: A Linear Algebraic Approach
An investigation of web crawler behavior: characterization and metrics
Computer Communications
The anatomy of a large-scale social search engine
Proceedings of the 19th international conference on World wide web
Searching for Heavy Tails in Web Robot Traffic
QEST '10 Proceedings of the 2010 Seventh International Conference on the Quantitative Evaluation of Systems
The Ethicality of Web Crawlers
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Web robot detection techniques: overview and limitations
Data Mining and Knowledge Discovery
Analysis of web logs: challenges and findings
PERFORM'10 Proceedings of the 2010 IFIP WG 6.3/7.3 international conference on Performance Evaluation of Computer and Communication Systems: milestones and future challenges
PUBCRAWL: protecting users and businesses from CRAWLers
Security'12 Proceedings of the 21st USENIX conference on Security symposium
A classification framework for web robots
Journal of the American Society for Information Science and Technology
Human sensing for smart cities
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
Sophisticated Web robots sport a wide variety of functionality and visiting characteristics, constituting a significant percentage of the requests serviced by a Web server. Unlike human clients that retrieve information off a site by navigating links and ignoring irrelevant information, Web robots may collect many different types of resources, and employ varying navigation strategies to find the knowledge on the site they desire. Thus, the resource request patterns of their visits are unpredictable and cannot be inferred based on our knowledge of human request patterns. In this paper, we perform an analysis on the types of resources requested by Web robots using recent Web logs from an academic Web server. We study the distribution of response sizes and response codes, the types of resources requested, and popularity of resources for requests from Web robots. Throughout, we contrast our findings against human resource request patterns. We find reasons to suggest that robots severely handicaps the ability of Web server caches to operate with high performance.