Detecting web crawlers from web server access logs with data mining classifiers

Authors:
Dusan Stevanovic;Aijun An;Natalija Vlajic
Affiliations:
Department of Computer Science and Engineering, York University, Toronto, Ontario, Canada;Department of Computer Science and Engineering, York University, Toronto, Ontario, Canada;Department of Computer Science and Engineering, York University, Toronto, Ontario, Canada
Venue:
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Year:
2011

Citing 5
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Discovery of Web Robot Sessions Based on their Navigational Patterns

Data Mining and Knowledge Discovery
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Web robot detection: A probabilistic reasoning approach

Computer Networks: The International Journal of Computer and Telecommunications Networking
Web robot detection techniques: overview and limitations

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study, we introduce two novel features: the consecutive sequential request ratio and standard deviation of page request depth, for improving the accuracy of malicious and non-malicious web crawler classification from static web server access logs with traditional data mining classifiers. In the first experiment we evaluate the new features on the classification of known well-behaved web crawlers and human visitors. In the second experiment we evaluate the new features on the classification of malicious web crawlers, unknown visitors, well-behaved crawlers and human visitors. The classification performance is evaluated in terms of classification accuracy, and F1 score. The experimental results demonstrate the potential of the two new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions.