On the convergence of bound optimization algorithms

Authors:
Ruslan Salakhutdinov;Sam Roweis;Zoubin Ghahramani
Affiliations:
University of Toronto, Canada;University of Toronto, Canada;Gatsby Computational Neuroscience Unit, University College London, London, UK
Venue:
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Year:
2002

Citing 2
Cited 9

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
On convergence properties of the em algorithm for gaussian mixtures

Neural Computation

A maximum entropy approach to species distribution modeling

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Triple Jump Acceleration for the EM Algorithm

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Convergence Theorems for Generalized Alternating Minimization Procedures

The Journal of Machine Learning Research
Non-negative Matrix Factorization with Orthogonality Constraints and its Application to Raman Spectroscopy

Journal of VLSI Signal Processing Systems
Algorithms for sparse nonnegative tucker decompositions

Neural Computation
Global and componentwise extrapolations for accelerating training of Bayesian networks and conditional random fields

Data Mining and Knowledge Discovery
Best document selection based on approximate utility optimization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A proof of convergence of the concave-convex procedure using zangwill's theory

Neural Computation
A graph matching algorithm based on concavely regularized convex relaxation

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the general class of bound optimization algorithms - including EM, Iterative Scaling, Non-negative Matrix Factorization, CCCP - and their relationship to direct optimization algorithms such as gradientbased methods for parameter learning. We derive a general relationship between the updates performed by bound optimization methods and those of gradient and second-order methods and identify analytic conditions under which bound optimization algorithms exhibit quasi-Newton behavior, and under which they possess poor, first-order convergence. Based on this analysis, we consider several specific algorithms, interpret and analyze their convergence properties and provide some recipes for preprocessing input to these algorithms to yield faster convergence behavior. We report empirical results supporting our analysis and showing that simple data preprocessing can result in dramatically improved performance of bound optimizers in practice.