C4.5: programs for machine learning
C4.5: programs for machine learning
Computer Networks and ISDN Systems
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Proceedings of the third annual conference on Autonomous Agents
What is actually taking place on web sites: e-commerce lessons from web server logs
Proceedings of the 2nd ACM conference on Electronic commerce
Information Retrieval
Computer
IEEE Software
Web usage mining: discovery and application of interesting patterns from web data
Web usage mining: discovery and application of interesting patterns from web data
Letizia: an agent that assists web browsing
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Mining Indirect Associations in Web Data
WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
Web Usage Mining as a Tool for Personalization: A Survey
User Modeling and User-Adapted Interaction
Findings from a Practical Project Concerning Web Usage Mining
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
The dark side of the Web: an open proxy's view
ACM SIGCOMM Computer Communication Review
Lessons and Challenges from Mining Retail E-Commerce Data
Machine Learning
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Mining interesting knowledge from weblogs: a survey
Data & Knowledge Engineering
Catching web crawlers in the act
ICWE '06 Proceedings of the 6th international conference on Web engineering
A process of knowledge discovery from web log data: Systematization and critical review
Journal of Intelligent Information Systems
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Characterizing typical and atypical user sessions in clickstreams
Proceedings of the 17th international conference on World Wide Web
Web robot detection in the scholarly information environment
Journal of Information Science
Computational Intelligence techniques for Web personalization
Web Intelligence and Agent Systems
Controlled experiments on the web: survey and practical guide
Data Mining and Knowledge Discovery
Web robot detection: A probabilistic reasoning approach
Computer Networks: The International Journal of Computer and Telecommunications Networking
Seven pitfalls to avoid when running controlled experiments on the web
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Use of Deception to Improve Client Honeypot Detection of Drive-by-Download Attacks
FAC '09 Proceedings of the 5th International Conference on Foundations of Augmented Cognition. Neuroergonomics and Operational Neuroscience: Held as Part of HCI International 2009
Exploring relevance for clicks
Proceedings of the 18th ACM conference on Information and knowledge management
An investigation of web crawler behavior: characterization and metrics
Computer Communications
Study on the Click Context of Web Search Users for Reliability Analysis
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
HoneySpam 2.0: Profiling Web Spambot Behaviour
PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
Identifying web navigation behaviour and patterns automatically from clickstream data
International Journal of Web Engineering and Technology
Data mining for web personalization
The adaptive web
A probabilistic reasoning approach for discovering web crawler sessions
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Filtering of web recommendation lists using positive and negative usage patterns
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Large-scale bot detection for search engines
Proceedings of the 19th international conference on World wide web
HengHa: data harvesting detection on hidden databases
Proceedings of the 2010 ACM workshop on Cloud computing security workshop
A brief survey on sequence classification
ACM SIGKDD Explorations Newsletter
Web robot detection techniques: overview and limitations
Data Mining and Knowledge Discovery
Towards tabbing aware recommendations
Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
Foundations and Trends in Information Retrieval
Characterizing e-business workloads using fractal methods
Journal of Web Engineering
Journal of Web Engineering
Finding unexpected navigation behaviour in clickstream data for website design improvement
Journal of Web Engineering
Detecting web crawlers from web server access logs with data mining classifiers
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
A pattern restore method for restoring missing patterns in server side clickstream data
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Research on path clustering based on the access interest of users
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Evaluation of web robot discovery techniques: a benchmarking study
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Behaviour-Based web spambot detection by utilising action time and action frequency
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Feature evaluation for web crawler detection with data mining techniques
Expert Systems with Applications: An International Journal
Web robot detection based on pattern-matching technique
Journal of Information Science
Analysis of web logs: challenges and findings
PERFORM'10 Proceedings of the 2010 IFIP WG 6.3/7.3 international conference on Performance Evaluation of Computer and Communication Systems: milestones and future challenges
PUBCRAWL: protecting users and businesses from CRAWLers
Security'12 Proceedings of the 21st USENIX conference on Security symposium
How much money do spammers make from your website?
Proceedings of the CUBE International Information Technology Conference
Detection of fixed length web spambot using REAL (read aligner)
Proceedings of the CUBE International Information Technology Conference
Blog or block: Detecting blog bots through behavioral biometrics
Computer Networks: The International Journal of Computer and Telecommunications Networking
Access patterns for robots and humans in web archives
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Web robots are software programs that automatically traverse the hyperlink structure of the World Wide Web in order to locate and retrieve information. There are many reasons why it is important to identify visits by Web robots and distinguish them from other users. First of all, e-commerce retailers are particularly concerned about the unauthorized deployment of robots for gathering business intelligence at their Web sites. In addition, Web robots tend to consume considerable network bandwidth at the expense of other users. Sessions due to Web robots also make it more difficult to perform clickstream analysis effectively on the Web data. Conventional techniques for detecting Web robots are often based on identifying the IP address and user agent of the Web clients. While these techniques are applicable to many well-known robots, they may not be sufficient to detect camouflaged and previously unknown robots. In this paper, we propose an alternative approach that uses the navigational patterns in the click-stream data to determine if it is due to a robot. Experimental results on our Computer Science department Web server logs show that highly accurate classification models can be built using this approach. We also show that these models are able to discover many camouflaged and previously unidentified robots.