Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables
Machine Learning - Special issue on learning with probabilistic representations
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Editorial: recent developments in mixture models
Computational Statistics & Data Analysis
Computational Statistics & Data Analysis
Choosing initial values for the EM algorithm for finite mixtures
Computational Statistics & Data Analysis
Editorial: Advances in Mixture Models
Computational Statistics & Data Analysis
Computational Statistics & Data Analysis
Model-based classification via mixtures of multivariate t-distributions
Computational Statistics & Data Analysis
Cox proportional hazards models with frailty for negatively correlated employment processes
Computational Statistics & Data Analysis
Model-based clustering of high-dimensional data: A review
Computational Statistics & Data Analysis
A multivariate linear regression analysis using finite mixtures of t distributions
Computational Statistics & Data Analysis
Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection
Computational Statistics & Data Analysis
Discretization-based direct random sample generation
Computational Statistics & Data Analysis
Finite mixtures of multivariate skew t-distributions: some recent and new results
Statistics and Computing
Hi-index | 0.03 |
The Expectation-Maximization (EM) algorithm is a popular tool in a wide variety of statistical settings, in particular in the maximum likelihood estimation of parameters when clustering using mixture models. A serious pitfall is that in the case of a multimodal likelihood function the algorithm may become trapped at a local maximum, resulting in an inferior clustering solution. In addition, convergence to an optimal solution can be very slow. Methods are proposed to address these issues: optimizing starting values for the algorithm and targeting maximization steps efficiently. It is demonstrated that these approaches can produce superior outcomes to initialization via random starts or hierarchical clustering and that the rate of convergence to an optimal solution can be greatly improved.