MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

  • Authors:
  • Jian Pei;Xiaoling Zhang;Moonjung Cho;Haixun Wang;Philip S. Yu

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pattern-based clustering is important in many applications,such as DNA micro-array data analysis, automaticrecommendation systems and target marketing systems.However, pattern-based clustering in large databasesis challenging. On the one hand, there can be a huge numberof clusters and many of them can be redundant and thusmake the pattern-based clustering ineffective. On the otherhand, the previous proposed methods may not be efficient orscalable in mining large databases.In this paper, we study the problem of maximal pattern-basedclustering. Redundant clusters are avoided completelyby mining only the maximal pattern-based clusters.MaPle, an efficient and scalable mining algorithm is developed.It conducts a depth-first, divide-and-conquer searchand prunes unnecessary branches smartly. Our extensiveperformance study on both synthetic data sets and real datasets shows that maximal pattern-based clustering is effective.It reduces the number of clusters substantially. Moreover,MaPle is more efficient and scalable than the previouslyproposed pattern-based clustering methods in mininglarge databases.