A session generalization technique for improved web usage mining

  • Authors:
  • Tahira Hasan;Sudhir P. Mudur;Nematollaah Shiri

  • Affiliations:
  • Concordia University, Montreal, PQ, Canada;Concordia University, Montreal, PQ, Canada;Concordia University, Montreal, PQ, Canada

  • Venue:
  • Proceedings of the eleventh international workshop on Web information and data management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Generalization of web sessions is an effective approach used to overcome two major challenges in web usage mining, namely quality and scalability. Given a concept hierarchy, such as a website, generalization replaces actual page-clicks with their general concepts, i.e., nodes at higher levels. Presently known methods do this by choosing a level in the hierarchy, below which all the nodes are generalized to nodes at this level. The problem with this is that significant items may be coalesced, and insignificant ones may be retained. We present a usage driven generalization algorithm, which coalesces less significant pages into more general ones, independent of their level in the hierarchy. Based on actual usage set of sessions, item significance is estimated approximately but fast, using a small stratified sample of the large dataset. While providing scalability, the proposed generalization technique results in improved efficiency and quality of the discovered usage model, demonstrated through numerous experiments in our work.