Mixtures of common t-factor analyzers for clustering high-dimensional microarray data

Authors:
Jangsun Baek;Geoffrey J. McLachlan
Affiliations:
-;-
Venue:
Bioinformatics
Year:
2011

Citing 0
Cited 8

The infinite Student's t-factor mixture analyzer for robust clustering and classification

Pattern Recognition
SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mixtures of common factor analyzers for high-dimensional data with missing information

Journal of Multivariate Analysis
Dimension reduction for model-based clustering via mixtures of multivariate $$t$$t-distributions

Advances in Data Analysis and Classification
Model-based clustering via linear cluster-weighted models

Computational Statistics & Data Analysis
Parsimonious skew mixture models for model-based clustering and classification

Computational Statistics & Data Analysis
Automated learning of factor analysis with complete and incomplete data

Computational Statistics & Data Analysis
Subspace clustering of high-dimensional data: a predictive approach

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions. Results: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods. Availability: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100. Contact: jbaek@jnu.ac.kr Supplementary information:Supplementary data are available at Bioinformatics online.