New global optimization algorithms for model-based clustering

Authors:
Jeffrey W. Heath;Michael C. Fu;Wolfgang Jank
Affiliations:
Department of Mathematics, Centre College, Danville, KY 40422, United States;Robert H. Smith School of Business, University of Maryland, College Park, MD 20742, United States;Robert H. Smith School of Business, University of Maryland, College Park, MD 20742, United States
Venue:
Computational Statistics & Data Analysis
Year:
2009

Citing 6
Cited 0

Elements of statistical computing

Elements of statistical computing
A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Global likelihood optimization via the cross-entropy method with an application to mixture models

WSC '04 Proceedings of the 36th conference on Winter simulation
Application of the cross-entropy method to clustering and vector quantization

Journal of Global Optimization
A Model Reference Adaptive Search Method for Global Optimization

Operations Research
Ascent EM for fast and global solutions to finite mixtures: An application to curve-clustering of online auctions

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

The Expectation-Maximization (EM) algorithm is a very popular optimization tool for mixture problems and in particular for model-based clustering problems. However, while the algorithm is convenient to implement and numerically very stable, it only produces local solutions. Thus, it may not achieve the globally optimal solution in problems that have a large number of local optima. This paper introduces several new algorithms designed to produce global solutions in model-based clustering. The building blocks for these algorithms are methods from the operations research literature, namely the Cross-Entropy (CE) method and Model Reference Adaptive Search (MRAS). One problem with applying these methods directly is the efficient simulation of positive definite covariance matrices. We propose several new solutions to this problem. One solution is to apply the principles of Expectation-Maximization updating, which leads to two new algorithms, CE-EM and MRAS-EM. We also propose two additional algorithms, CE-CD and MRAS-CD, which rely on the Cholesky decomposition. We conduct numerical experiments of varying complexity to evaluate the effectiveness of the proposed algorithms in comparison to classical EM. We find that although a single run of the new algorithms is slower than a single run of EM, all have the potential for producing significantly better solutions. We also find that although repeat application of EM may achieve similar results, our algorithms provide automated, data-driven decision rules which may significantly reduce the burden of searching for the global optimum.