Mixture-model cluster analysis using information theoretical criteria

Authors:
Jaime R. S. Fonseca;Margarida G. M. S. Cardoso
Affiliations:
(Correspd. Tel. +351 213 619 430 (3179)/ Fax. +351 213 619 430) ISCSP-Insto. Sup. de Ciê/ncias Soc. e Polí/ticas, R. Almerindo Lessa, Pó/lo Universitá/rio do Alto da Ajuda, 1349-05 ...;ISCTE - Business School, Department of Quantitative Methods, Av. das Forç/as Armadas, 1649-026 Lisboa, Portugal. E-mail: margarida.cardoso@iscte.pt
Venue:
Intelligent Data Analysis
Year:
2007

Citing 16
Cited 4

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
An improvement of the NEC criterion for assessing the number of clusters in a mixture model

Non-Linear Analysis
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mixture model clustering for mixed data with missing information

Computational Statistics & Data Analysis
Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models

Computational Statistics & Data Analysis
Robust mixture modelling using multivariate t-distribution with missing information

Pattern Recognition Letters
On a resampling approach for tests on the number of clusters with mixture model-based clustering of tissue samples

Journal of Multivariate Analysis
Evaluation and optimization of clustering in gene expression data analysis

Bioinformatics
Supervised cluster analysis for microarray data based on multivariate Gaussian mixture

Bioinformatics
Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays

Bioinformatics
A genetic algorithm for cluster analysis

Intelligent Data Analysis
Evolutionary model selection in unsupervised learning

Intelligent Data Analysis
Retail clients latent segments

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Nonparametric genetic clustering: comparison of validity indices

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

On EM Estimation for Mixture of Multivariate t-Distributions

Neural Processing Letters
Stochastic approximation learning for mixtures of multivariate elliptical distributions

Neurocomputing
Model-based clustering of high-dimensional data: Variable selection versus facet determination

International Journal of Approximate Reasoning
HMM-based hybrid meta-clustering ensemble for temporal data

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The estimation of mixture models has been proposed for quite some time as an approach for cluster analysis. Several variants of the Expectation-Maximization algorithm are currently available for this purpose. Estimation of mixture models simultaneously allows the determination of the number of clusters and yields distributional parameters for clustering base variables. There are several information criteria that help to support the selection of a particular model or clustering structure. However, a question remains concerning the selection of specific criteria that may be more suitable for particular applications. In the present work we analyze the relationship between the performance of information criteria and the type of measurement of clustering variables. In order to study this relationship we perform the analysis of forty-two data sets with known clustering structure and with clustering variables that are categorical, continuous and mixed type. We then compare eleven information-based criteria in their ability to recover the data sets' clustering structures. As a result, we select AIC3, BIC and ICL-BIC criteria as the best candidates for model selection that refers to models with categorical, continuous and mixed type clustering variables, respectively.