Cross-validation and aggregated EM training for robust parameter estimation

Authors:
Takahiro Shinozaki;Mari Ostendorf
Affiliations:
Academic Center for Computing and Media Studies, Kyoto University, Kyoto 606-8501, Japan;Department of Electrical Engineering, University of Washington, Seattle, WA 98195-2500, USA
Venue:
Computer Speech and Language
Year:
2008

Citing 3
Cited 1

Bagging predictors

Machine Learning
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Convergence Theorems for Generalized Alternating Minimization Procedures

The Journal of Machine Learning Research

Theoretical analysis of cross-validation(CV)-EM algorithm

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new maximum likelihood training algorithm is proposed that compensates for weaknesses of the EM algorithm by using cross-validation likelihood in the expectation step to avoid overtraining. By using a set of sufficient statistics associated with a partitioning of the training data, as in parallel EM, the algorithm has the same order of computational requirements as the original EM algorithm. Another variation uses an approximation of bagging to reduce variance in the E-step but at a somewhat higher cost. Analyses using GMMs with artificial data show the proposed algorithms are more robust to overtraining than the conventional EM algorithm. Large vocabulary recognition experiments on Mandarin broadcast news data show that the methods make better use of more parameters and give lower recognition error rates than standard EM training.