Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
Supporting Ranked Boolean Similarity Queries in MARS
IEEE Transactions on Knowledge and Data Engineering
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
On-line EM Algorithm for the Normalized Gaussian Network
Neural Computation
Scalable model-based cluster analysis using clustering features
Pattern Recognition
Research of fast SOM clustering for text information
Expert Systems with Applications: An International Journal
Weighted Fuzzy-Possibilistic C-Means Over Large Data Sets
International Journal of Data Warehousing and Mining
Hi-index | 0.01 |
We present an algorithm for generating a mixture model from a data set by converting the data into a model. The method is applicable when only part of the data fits in the main memory at the same time. The generated model is a Gaussian mixture model but the algorithm can be adapted to other types of models, too. The user cannot specify the size of the generated model. We also introduce a post-processing method, which can reduce the size of the model without using the original data. This will result in a more compact model with fewer components, but with approximately the same representation accuracy as the original model. Our comparisons show that the algorithm produces good results and is quite efficient. The whole process requires only 0.5-10% of the time spent by the expectation-maximization algorithm.