Linear Time Model Selection for Mixture of Heterogeneous Components

Authors:
Ryohei Fujimaki;Satoshi Morinaga;Michinari Momma;Kenji Aoki;Takayuki Nakata
Affiliations:
NEC Common Platform Software Research Laboratories,;NEC Common Platform Software Research Laboratories,;NEC Common Platform Software Research Laboratories,;NEC Common Platform Software Research Laboratories,;NEC Common Platform Software Research Laboratories,
Venue:
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Year:
2009

Citing 9
Cited 1

Elements of information theory

Elements of information theory
A Learning Criterion for Stochastic Rules

Machine Learning - Computational learning theory
A Hierarchical Latent Variable Model for Data Visualization

IEEE Transactions on Pattern Analysis and Machine Intelligence
MML clustering of multi-state, Poisson, vonMises circular and Gaussian distributions

Statistics and Computing
Single Factor Analysis in MML Mixture Modelling

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
MDL-Based Selection of the Number of Components in Mixture Models for Pattern Classification

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Advances in Minimum Description Length: Theory and Applications (Neural Information Processing)

Advances in Minimum Description Length: Theory and Applications (Neural Information Processing)
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression

IEEE Transactions on Pattern Analysis and Machine Intelligence

Online heterogeneous mixture modeling with marginal and copula selection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our main contribution is to propose a novel model selection methodology, expectation minimization of description length (EMDL), based on the minimum description length (MDL) principle. EMDL makes a significant impact on the combinatorial scalability issue pertaining to the model selection for mixture models having types of components. A goal of such problems is to optimize types of components as well as the number of components. One key idea in EMDL is to iterate calculations of the posterior of latent variables and minimization of expected description length of both observed data and latent variables. This enables EMDL to compute the optimal model in linear time with respect to both the number of components and the number of available types of components despite the fact that the number of model candidates exponentially increases with the numbers. We prove that EMDL is compliant with the MDL principle and enjoys its statistical benefits.