Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Algorithms for Model-Based Gaussian Hierarchical Clustering
SIAM Journal on Scientific Computing
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
Data mining: concepts and techniques
Data mining: concepts and techniques
An experimental comparison of model-based clustering methods
Machine Learning
Unsupervised Learning of Finite Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Accelerating EM for Large Databases
Machine Learning
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
Scaling-Up Model-Based Clustering Algorithm by Working on Clustering Features
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site
Data Mining and Knowledge Discovery
Scalable model-based clustering algorithms for large databases and their applications
Scalable model-based clustering algorithms for large databases and their applications
Expanding self-organizing map for data visualization and cluster analysis
Information Sciences: an International Journal - Special issue: Soft computing data mining
Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures
Neural Computation
Scalable Model-Based Clustering for Large Databases Based on Data Summarization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Gradual Model Generator for Single-Pass Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Gradual model generator for single-pass clustering
Pattern Recognition
Delineation of support domain of feature in the presence of noise
Computers & Geosciences
Practical issues on privacy-preserving health data mining
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
A HMM-based hierarchical framework for long-term population projection of small areas
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Multi-scale decomposition of point process data
Geoinformatica
Hi-index | 0.01 |
We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm-EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.