A Classification EM algorithm for clustering and two stochastic versions
Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables
Machine Learning - Special issue on learning with probabilistic representations
Algorithms for Model-Based Gaussian Hierarchical Clustering
SIAM Journal on Scientific Computing
Update rules for parameter estimation in Bayesian networks
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction
Machine Learning - Special issue: Unsupervised learning
Performance Evaluation of Some Clustering Algorithms and Validity Indices
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hierarchical model-based clustering of relational data with aggregates
Proceedings of the 2004 ACM symposium on Applied computing
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Fast Recognition of Musical Genres Using RBF Networks
IEEE Transactions on Knowledge and Data Engineering
Correlation clustering in general weighted graphs
Theoretical Computer Science - Approximation and online algorithms
Definition of MV load diagrams via weighted evidence accumulation clustering using subsampling
ISPRA'07 Proceedings of the 6th WSEAS International Conference on Signal Processing, Robotics and Automation
Definition of MV load diagrams via weighted evidence accumulation clustering using subsampling
ISPRA'07 Proceedings of the 6th WSEAS International Conference on Signal Processing, Robotics and Automation
In search of deterministic methods for initializing K-means and Gaussian mixture clustering
Intelligent Data Analysis
A new initialization method for categorical data clustering
Expert Systems with Applications: An International Journal
An initialization method for the K-Means algorithm using neighborhood model
Computers & Mathematics with Applications
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Accelerating EM: an empirical study
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Neighborhood density method for selecting initial cluster centers in k-means clustering
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Gossip-Based greedy gaussian mixture learning
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
A two-stage genetic algorithm for automatic clustering
Neurocomputing
The effectiveness of lloyd-type methods for the k-means problem
Journal of the ACM (JACM)
Hi-index | 0.00 |
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a "winner take all" version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality.