Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

  • Authors:
  • Igor Cadez;David Heckerman;Christopher Meek;Padhraic Smyth;Steven White

  • Affiliations:
  • Sparta Inc., 23382 Mill Creek Drive, #100 Laguna Hills, CA 92653, USA. igor_cadez@sparta.com;Microsoft Research, One Microsoft Way, Redmond, WA 98052-6399, USA. heckerma@microsoft.com;Microsoft Research, One Microsoft Way, Redmond, WA 98052-6399, USA. meek@microsoft.com;School of Information and Computer Science, University of California, Irvine, CA 92697-3425, USA. smyth@ics.uci.edu;Microsoft Research, One Microsoft Way, Redmond, WA 98052-6399, USA. stevewh@microsoft.com

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we first partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data; and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-traffic data from msnbc.com.