Akaike's information criterion and recent developments in information complexity
Journal of Mathematical Psychology
Model selection for probabilistic clustering using cross-validatedlikelihood
Statistics and Computing
Editorial: recent developments in mixture models
Computational Statistics & Data Analysis
Pattern Recognition Letters
A Learning Scheme for Recognizing Sub-classes from Model Trained on Aggregate Classes
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Computational Statistics & Data Analysis
Automatic model selection by cross-validation for probabilistic PCA
Neural Processing Letters
Cognitive Systems Research
Estimation of finite mixtures with symmetric components
Statistics and Computing
Using conditional independence for parsimonious model-based Gaussian clustering
Statistics and Computing
Constrained Multilevel Latent Class Models for the Analysis of Three-Way Three-Mode Binary Data
Journal of Classification
Hi-index | 0.03 |
Estimation of the number of mixture components (k) is an unsolved problem. Available methods for estimation of k include bootstrapping the likelihood ratio test statistic and optimizing a variety of validity functionals. We investigate the minimization of distance between fitted mixture model and the true density as a method for estimating k. The distances considered are Kullback-Leibler (KL) and L2. We estimate these distances using cross validation. A reliable estimate of k is obtained by voting of B estimates of k corresponding to B cross validation estimates of distance. This estimation method with KL distance is very similar to Monte Carlo cross validated likelihood method discussed by Smyth (Statist. Computing 10(1) (2000) 63). With focus on univariate normal mixtures, we present simulation studies that compare the cross validated distance method with Akaika's Information Criterion (AIC), Bayesian Information Criterion/Minimum description criterion (BIC/MDL), and Information Complexity (ICOMP). We also apply the cross validation estimate of distance approach along with AIC, BIC/MDL and ICOMP approach, to data from an osteoporosis drug trial in order to find groups that differentially respond to treatment. In our closing remarks, we highlight the general applicability of our method to choose between any set of estimators of a particular parameter of interest, assuming the presence of an approximately unbiased estimator.