Feature evaluation for web crawler detection with data mining techniques

Authors:
Dusan Stevanovic;Aijun An;Natalija Vlajic
Affiliations:
Department of Computer Science and Engineering, York University, 4700 Keele St., Toronto, Ontario, Canada M3J 1P3;Department of Computer Science and Engineering, York University, 4700 Keele St., Toronto, Ontario, Canada M3J 1P3;Department of Computer Science and Engineering, York University, 4700 Keele St., Toronto, Ontario, Canada M3J 1P3
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 11
Cited 1

Discovery of Web Robot Sessions Based on their Navigational Patterns

Data Mining and Knowledge Discovery
Identifying Interesting Customers through Web Log Classification

IEEE Intelligent Systems
Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests

Data & Knowledge Engineering
Securing web service by automatic robot detection

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Web robot detection: A probabilistic reasoning approach

Computer Networks: The International Journal of Computer and Telecommunications Networking
Detection of cloaked web spam by using tag-based methods

Expert Systems with Applications: An International Journal
Monitoring the application-layer DDoS attacks for popular websites

IEEE/ACM Transactions on Networking (TON)
Malicious web content detection by machine learning

Expert Systems with Applications: An International Journal
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Modeling human behavior for defense against flash-crowd attacks

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Web Spambot Detection Based on Web Navigation Behaviour

AINA '10 Proceedings of the 2010 24th IEEE International Conference on Advanced Information Networking and Applications

Detection of malicious and non-malicious website visitors using unsupervised neural network learning

Applied Soft Computing

Quantified Score

Hi-index	12.05

Visualization

Abstract

Distributed Denial of Service (DDoS) is one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the WWW. In this study we examine the effect of applying seven well-established data mining classification algorithms on static web server access logs in order to: (1) classify user sessions as belonging to either automated web crawlers or human visitors and (2) identify which of the automated web crawlers sessions exhibit 'malicious' behavior and are potentially participants in a DDoS attack. The classification performance is evaluated in terms of classification accuracy, recall, precision and F"1 score. Seven out of nine vector (i.e. web-session) features employed in our work are borrowed from earlier studies on classification of user sessions as belonging to web crawlers. However, we also introduce two novel web-session features: the consecutive sequential request ratio and standard deviation of page request depth. The effectiveness of the new features is evaluated in terms of the information gain and gain ratio metrics. The experimental results demonstrate the potential of the new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions.