C4.5: programs for machine learning
C4.5: programs for machine learning
Computer Networks and ISDN Systems
A framework for constructing features and models for intrusion detection systems
ACM Transactions on Information and System Security (TISSEC)
Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Information extraction for enhanced access to disease outbreak reports
Journal of Biomedical Informatics - Special issue: Sublanguage
An investigation of web crawler behavior: characterization and metrics
Computer Communications
Web robot detection in the scholarly information environment
Journal of Information Science
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine
ACM Transactions on Intelligent Systems and Technology (TIST)
Web robot detection based on pattern-matching technique
Journal of Information Science
Effective web log mining and online navigational pattern prediction
Knowledge-Based Systems
Hi-index | 0.00 |
This paper recommends a new approach to the detection and containment of Web crawler traverses based on clickstream data mining. Timely detection prevents crawler abusive consumption of Web server resources and eventual site contents privacy or copyrights violation. Clickstream data differentiation ensures focused usage analysis, valuable both for regular users and crawler profiling. Our platform, named ClickTips, sustains a site-specific, updatable detection model that tags Web crawler traverses based on incremental Web session inspection and a decision model that assesses eventual containment. The goal is to deliver a model flexible enough to keep up with crawling continuous evolving and that is capable of detecting crawler presence as soon as possible. We use a real-world Web site case study as a support for process description, as well as, to evaluate the accuracy of the obtained classification models and their ability for discovering previously unknown Web crawlers.