Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm
Journal of Multivariate Analysis
Clustering and classification via cluster-weighted factor analyzers
Advances in Data Analysis and Classification
Dimension reduction for model-based clustering via mixtures of multivariate $$t$$t-distributions
Advances in Data Analysis and Classification
Model-based clustering of high-dimensional data: A review
Computational Statistics & Data Analysis
Model-based clustering via linear cluster-weighted models
Computational Statistics & Data Analysis
Parsimonious skew mixture models for model-based clustering and classification
Computational Statistics & Data Analysis
A LASSO-penalized BIC for mixture model selection
Advances in Data Analysis and Classification
Hi-index | 3.84 |
Motivation: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation–maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. Results: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. Availability: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info Contact: pmcnicho@uoguelph.ca