Scaling-Up Model-Based Clustering Algorithm by Working on Clustering Features

Authors:
Huidong Jin;Kwong-Sak Leung;Man Leung Wong
Affiliations:
-;-;-
Venue:
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Year:
2002

Citing 4
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Very fast EM-based mixture model clustering using multiresolution kd-trees

Proceedings of the 1998 conference on Advances in neural information processing systems II
An expectation-maximization algorithm working on data summary

Second international workshop on Intelligent systems design and application

A delivery framework for health data mining and analytics

ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
Scalable model-based cluster analysis using clustering features

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose EMACF (Expectation-Maximization Algorithm for Clustering Features) to generate clusters from data summaries rather than data items directly. Incorporating with an adaptive grid-based data summarization procedure, we establish a scalable clustering algorithm: gEMACF. The experimental results show that gEMACF can generate more accurate results than other scalable clustering algorithms. The experimental results also indicate that gEMACF can run two order of magnitude faster than the traditional expectation-maximization algorithm with little loss of accuracy.