Categorizing Visitors Dynamically by Fast and Robust Clustering of Access Logs

  • Authors:
  • Vladimir Estivill-Castro;Jianhua Yang

  • Affiliations:
  • -;-

  • Venue:
  • WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering plays a central role in segmenting markets. The identification of categories of visitors to a Web-site is very useful towards improved Web applications. However, the large volume involved in mining visitation paths, demands efficient clustering algorithms that are also resistant to noise and outliers. Also, dissimilarity between visitation paths involves sophisticated evaluation and results in large dimension of attribute-vectors. We present a randomized, iterative algorithm (a la Expectation Maximization or k-means) but based on discrete medoids. We prove that our algorithm converges and that has subquadratic complexity. We compare to the implementation of the fastest version of matrix-based clustering for visitor paths and show that our algorithm outperforms dramatically matrix-based methods.