Time series: data analysis and theory
Time series: data analysis and theory
Time Series Analysis, Forecasting and Control
Time Series Analysis, Forecasting and Control
Web crawling ethics revisited: Cost, privacy, and denial of service
Journal of the American Society for Information Science and Technology
An investigation of web crawler behavior: characterization and metrics
Computer Communications
Foundations and Trends in Information Retrieval
The Ethicality of Web Crawlers
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Analysis of web logs: challenges and findings
PERFORM'10 Proceedings of the 2010 IFIP WG 6.3/7.3 international conference on Performance Evaluation of Computer and Communication Systems: milestones and future challenges
Surviving a search engine overload
Proceedings of the 21st international conference on World Wide Web
Detecting Web Robots Using Resource Request Patterns
ICMLA '12 Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01
Time Series Analysis of the Dynamics of News Websites
PDCAT '12 Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies
Hi-index | 0.00 |
The traffic produced by the periodic crawling activities of Web robots often represents a good fraction of the overall websites traffic, thus causing some non-negligible effects on their performance. Our study focuses on the traffic generated on the SPEC website by many different Web robots, including, among the others, the robots employed by some popular search engines. This extensive investigation shows that the behavior and crawling patterns of the robots vary significantly in terms of requests, resources and clients involved in their crawling activities. Some robots tend to concentrate their requests in short periods of time and follow some sorts of deterministic patterns characterized by multiple peaks. The requests of other robots exhibit a time dependent behavior and repeated patterns with some periodicity. We represent the traffic as a time series modelled in the frequency domain. The identified models, consisting of trigonometric polynomials and Auto Regressive Moving Average components, accurately summarize the behavior of the overall traffic as well as the traffic of individual robots. These models can be easily used as a basis for forecasting.