Gradual model generator for single-pass clustering

Authors:
Ismo Kärkkäinen;Pasi Fränti
Affiliations:
Speech and Image Processing Unit, Department of Computer Science, University of Joensuu, FIN-80101, Finland;Speech and Image Processing Unit, Department of Computer Science, University of Joensuu, FIN-80101, Finland
Venue:
Pattern Recognition
Year:
2007

Citing 11
Cited 2

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
FREM: fast and robust EM clustering for large data sets

Proceedings of the eleventh international conference on Information and knowledge management
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Supporting Ranked Boolean Similarity Queries in MARS

IEEE Transactions on Knowledge and Data Engineering
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
On-line EM Algorithm for the Normalized Gaussian Network

Neural Computation
Scalable model-based cluster analysis using clustering features

Pattern Recognition

Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
Weighted Fuzzy-Possibilistic C-Means Over Large Data Sets

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present an algorithm for generating a mixture model from a data set by converting the data into a model. The method is applicable when only part of the data fits in the main memory at the same time. The generated model is a Gaussian mixture model but the algorithm can be adapted to other types of models, too. The user cannot specify the size of the generated model. We also introduce a post-processing method, which can reduce the size of the model without using the original data. This will result in a more compact model with fewer components, but with approximately the same representation accuracy as the original model. Our comparisons show that the algorithm produces good results and is quite efficient. The whole process requires only 0.5-10% of the time spent by the expectation-maximization algorithm.