Evaluation of web robot discovery techniques: a benchmarking study

  • Authors:
  • Nick Geens;Johan Huysmans;Jan Vanthienen

  • Affiliations:
  • Department of Decision Sciences and Information Management, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Decision Sciences and Information Management, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Decision Sciences and Information Management, Katholieke Universiteit Leuven, Leuven, Belgium

  • Venue:
  • ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes part of a web usage mining study executed on log files obtained from a Belgian e-commerce company. From these log files, it can be observed that numerous web robots are active on the site. Most of these robots show a crawling behavior that is radically different from the browsing behavior of human visitors. Because the owners of the e-shop desire information about the paths that human visitors follow through the site, it is of crucial importance to remove these robotic visits from the log files. Several existing methods for web robot discovery are evaluated and compared, none of them leading to satisfying results. Therefore, a new technique is developed that results in a successful and reliable identification of web robots.