A Hybrid Approach to Web Usage Mining

  • Authors:
  • Søren E. Jespersen;Jesper Thorhauge;Torben Bach Pedersen

  • Affiliations:
  • -;-;-

  • Venue:
  • DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the large number of companies using the Internet to distribute and collect information, knowledge discovery on the web has become an important research area. Web usage mining, which is the main topic of this paper, focuses on knowledge discovery from the clicks in the web log for a given site (the so-called click-stream), especially on analysis of sequences of clicks. Existing techniques for analyzing click sequences have different drawbacks, i.e., either huge storage requirements, excessive I/O cost, or scalability problems when additional information is introduced into the analysis.In this paper we present a new hybrid approach for analyzing click sequences that aims to overcome these drawbacks. The approach is based on a novel combination of existing approaches, more specifically the Hypertext Probabilistic Grammar (HPG) and Click Fact Table approaches. The approach allows for additional information, e.g., user demographics, to be included in the analysis without introducing performance problems. The development is driven by experiences gained from industry collaboration. A prototype has been implemented and experiments are presented that show that the hybrid approach performs well compared to the existing approaches. This is especially true when mining sessions containing clicks with certain characteristics, i.e., when constraints are introduced. The approach is not limited to web log analysis, but can also be used for general sequence mining tasks.