Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Causality: models, reasoning, and inference
Causality: models, reasoning, and inference
In search of invariants for e-business workloads
Proceedings of the 2nd ACM conference on Electronic commerce
ACM Transactions on Internet Technology (TOIT)
Mining e-commerce data: the good, the bad, and the ugly
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Face detection by aggregated Bayesian network classifiers
Pattern Recognition Letters - In memory of Professor E.S. Gelsema
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Learning Bayesian Networks
An investigation of web crawler behavior: characterization and metrics
Computer Communications
Artificial Intelligence in Medicine
A distributed middleware infrastructure for personalized services
Computer Communications
Proceedings of the 2009 workshop on Web Search Click Data
Large-scale bot detection for search engines
Proceedings of the 19th international conference on World wide web
Foundations and Trends in Information Retrieval
Detecting web crawlers from web server access logs with data mining classifiers
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Feature evaluation for web crawler detection with data mining techniques
Expert Systems with Applications: An International Journal
Web robot detection based on pattern-matching technique
Journal of Information Science
Analysis of web logs: challenges and findings
PERFORM'10 Proceedings of the 2010 IFIP WG 6.3/7.3 international conference on Performance Evaluation of Computer and Communication Systems: milestones and future challenges
Surviving a search engine overload
Proceedings of the 21st international conference on World Wide Web
Blog or block: Detecting blog bots through behavioral biometrics
Computer Networks: The International Journal of Computer and Telecommunications Networking
Access patterns for robots and humans in web archives
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses an adaptive-threshold technique to extract Web sessions from access logs. Then, we apply machine learning techniques to determine the parameters of the probabilistic model. The resulting classification is based on the maximum posterior probability of all classes given the available evidence. We apply our method to real Web-server logs and obtain results that demonstrate the robustness and effectiveness of probabilistic reasoning for crawler detection.