Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm

  • Authors:
  • Sungjune Park;Nallan C. Suresh;Bong-Keun Jeong

  • Affiliations:
  • The University of North Carolina at Charlotte, Business Information Systems and Operations Management, The Belk College of Business, 9201 University City Blvd, Charlotte, NC 28223, United States;Department of Operations Management and Strategy, School of Management, State University of New York, Buffalo, NY 14260, United States;The University of North Carolina at Charlotte, Business Information Systems and Operations Management, The Belk College of Business, 9201 University City Blvd, Charlotte, NC 28223, United States

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated.