A Single Loop EM Algorithm for the Mixture of Experts Architecture

Authors:
Yan Yang;Jinwen Ma
Affiliations:
Department of Information Science, School of Mathematical Sciences & LMAM, Peking University, Beijing, P. R. China 100871;Department of Information Science, School of Mathematical Sciences & LMAM, Peking University, Beijing, P. R. China 100871
Venue:
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Year:
2009

Citing 8
Cited 1

Hierarchical mixtures of experts and the EM algorithm

Neural Computation
Convergence results for the EM approach to mixtures of experts architectures

Neural Networks
Improved learning algorithms for mixture of experts in multiclass classification

Neural Networks
Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures

Neural Computation
Adaptive mixtures of local experts

Neural Computation
Asymptotic convergence properties of the EM algorithm with respect to the overlap in the mixture

Neurocomputing
On the correct convergence of the EM algorithm for Gaussian mixtures

Pattern Recognition
Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification

IEEE Transactions on Neural Networks

Asymptotic convergence properties of the em algorithm for mixture of experts

Neural Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

The mixture of experts (ME) architecture is a powerful neural network model for supervised learning, which contains a number of ``expert''networks plus a gating network. The expectation-maximization (EM) algorithm can be used to learn the parameters of the ME architecture. In fact, there have already existed several methods to implement the EM algorithm, such as the IRLS algorithm, the ECM algorithm, and an approximation to the Newton-Raphson algorithm. The differences among these implementations rely on how to train the gating network, which results in a double-loop training procedure, i.e., there is an inner loop training procedure within the general or outer loop training procedure. In this paper, we propose a least mean square regression method to learn or compute the parameters for the gating network directly, which leads to a single loop (i.e., there is no inner loop training) EM algorithm for the ME architecture. It is demonstrated by the simulation experiments that our proposed EM algorithm outperforms the existing ones on both speed and classification accuracy.