Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Determining WWW User Agents from Server Access Log
ICPADS '00 Proceedings of the Seventh International Conference on Parallel and Distributed Systems: Workshops
Botz-4-sale: surviving organized DDoS attacks that mimic flash crowds
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Securing web service by automatic robot detection
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Web robot detection in the scholarly information environment
Journal of Information Science
Discovering New Trends in Web Robot Traffic Through Functional Classification
NCA '08 Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications
An Anti-SMS-Spam Using CAPTCHA
CCCM '08 Proceedings of the 2008 ISECS International Colloquium on Computing, Communication, Control, and Management - Volume 02
Proceedings of the 2009 workshop on Web Search Click Data
What's up CAPTCHA?: a CAPTCHA based on image orientation
Proceedings of the 18th international conference on World wide web
An investigation of web crawler behavior: characterization and metrics
Computer Communications
CAPTCHA: using hard AI problems for security
EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
A probabilistic reasoning approach for discovering web crawler sessions
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Measuring the web crawler ethics
Proceedings of the 19th international conference on World wide web
Re: CAPTCHAs: understanding CAPTCHA-solving services in an economic context
USENIX Security'10 Proceedings of the 19th USENIX conference on Security
Evaluation of web robot discovery techniques: a benchmarking study
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Detecting web crawlers from web server access logs with data mining classifiers
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Web robot detection based on pattern-matching technique
Journal of Information Science
PUBCRAWL: protecting users and businesses from CRAWLers
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Access patterns for robots and humans in web archives
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
A comparison of web robot and human requests
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
Most modern Web robots that crawl the Internet to support value-added services and technologies possess sophisticated data collection and analysis capabilities. Some of these robots, however, may be ill-behaved or malicious, and hence, may impose a significant strain on a Web server. It is thus necessary to detect Web robots in order to block undesirable ones from accessing the server. Such detection is also essential to ensure that the robot traffic is considered appropriately in the performance and capacity planning of Web servers. Despite a variety of Web robot detection techniques, there is no consensus regarding a single technique, or even a specific "type" of technique, that performs well in practice. Therefore, to aid in the development of a practically applicable robot detection technique, this survey presents a critical analysis and comparison of the prevalent detection approaches. We propose a framework to classify the existing detection techniques into four categories based on their underlying detection philosophy. We compare the different classes to gain insights into those characteristics that make up an effective robot detection scheme. Finally, we discuss why the contemporary techniques fail to offer a general solution to the robot detection problem and propose a set of key ingredients necessary for strong Web robot detection.