Clustering of web sessions using levenshtein metric

  • Authors:
  • Andrei Scherbina;Sergey Kuznetsov

  • Affiliations:
  • RAS, Institute for System Programming, Moscow;RAS, Institute for System Programming, Moscow

  • Venue:
  • ICDM'04 Proceedings of the 4th international conference on Advances in Data Mining: applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Various commercial and scientific applications require analysis of user behaviour in the Internet. For example, web marketing or network technical support can benefit from web users classification. This is achievable by tracking pages visited by the user during one session (one visit to the particular site). For automated user sessions classification we propose distance that compares sessions judging by the sequence of pages in them and by categories of these pages. Proposed distance is based on Levenshtein metric. Fuzzy C Medoids algorithm was used for clustering, since it has almost linear complexity. Davies-Bouldin, Entropy, and Bezdek validity indices were used to assess the qualities of proposed method. As testing shows, our distance outperforms in this domain both Euclidian and Edit distances.