Scalable model-based cluster analysis using clustering features

  • Authors:
  • Huidong Jin;Kwong-Sak Leung;Man-Leung Wong;Zong-Ben Xu

  • Affiliations:
  • CSIRO Data Mining Research, GPO Box 664, Canberra ACT 2601, Australia;Department of Computer Science & Engineering, CUHK, Shatin, Hong Kong;Department of Computing and Decision Sciences, Lingnan University, Tuen Mun, Hong Kong;Faculty of Science, Xi'an Jiaotong University, 710049 Xi'an, P.R. China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm-EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.