A Fast Algorithm for Subspace Clustering by Pattern Similarity

  • Authors:
  • Haixun Wang;Fang Chu;Wei Fan;Philip S. Yu;Jian Pei

  • Affiliations:
  • IBM T.J. Watson Research Center;Univ. of California, Los Angeles;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;SUNY Buffalo

  • Venue:
  • SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unlike traditional clustering methods that focus ongrouping objects with similar values on a set of dimensions,clustering by pattern similarity finds objects thatexhibit a coherent pattern of rise and fall in subspaces.Pattern-based clustering extends the concept of traditional clustering and bene ts a wide range of applications, including large scale scientific data analysis, targetmarketing, web usage analysis, etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands ofrecords, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records andnetwork event logs are usually modeled as data sequences.Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences.In this paper, we present a novel algorithm that offersthis capability. Experimental results from both real lifeand synthetic datasets prove its effectiveness and efficiency.