Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Identifying Interesting Customers through Web Log Classification
IEEE Intelligent Systems
Securing web service by automatic robot detection
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Web robot detection: A probabilistic reasoning approach
Computer Networks: The International Journal of Computer and Telecommunications Networking
Detection of cloaked web spam by using tag-based methods
Expert Systems with Applications: An International Journal
Monitoring the application-layer DDoS attacks for popular websites
IEEE/ACM Transactions on Networking (TON)
Malicious web content detection by machine learning
Expert Systems with Applications: An International Journal
CAPTCHA: using hard AI problems for security
EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Modeling human behavior for defense against flash-crowd attacks
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Web Spambot Detection Based on Web Navigation Behaviour
AINA '10 Proceedings of the 2010 24th IEEE International Conference on Advanced Information Networking and Applications
Hi-index | 12.05 |
Distributed Denial of Service (DDoS) is one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the WWW. In this study we examine the effect of applying seven well-established data mining classification algorithms on static web server access logs in order to: (1) classify user sessions as belonging to either automated web crawlers or human visitors and (2) identify which of the automated web crawlers sessions exhibit 'malicious' behavior and are potentially participants in a DDoS attack. The classification performance is evaluated in terms of classification accuracy, recall, precision and F"1 score. Seven out of nine vector (i.e. web-session) features employed in our work are borrowed from earlier studies on classification of user sessions as belonging to web crawlers. However, we also introduce two novel web-session features: the consecutive sequential request ratio and standard deviation of page request depth. The effectiveness of the new features is evaluated in terms of the information gain and gain ratio metrics. The experimental results demonstrate the potential of the new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions.