On a resampling approach for tests on the number of clusters with mixture model-based clustering of tissue samples

Authors:
G. J. McLachlan;N. Khan
Affiliations:
Department of Mathematics, University of Queensland, St. Lucia, Queensland, Brisbane 4072, Australia;Institute for Molecular BioScience, University of Queensland, St. Lucia, Queensland, Brisbane 4072, Australia
Venue:
Journal of Multivariate Analysis
Year:
2004

Citing 6
Cited 8

Some computational issues in cluster analysis with no a priori metric

Computational Statistics & Data Analysis
A mixture model approach for the analysis of microarray gene expression data

Computational Statistics & Data Analysis
Robust mixture modelling using the t distribution

Statistics and Computing
Modelling high-dimensional data by mixtures of factor analyzers

Computational Statistics & Data Analysis
Mixtures of Factor Analyzers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Robust Cluster Analysis via Mixtures of Multivariate t-Distributions

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition

Performance of data resampling methods for robust class discovery based on clustering

Intelligent Data Analysis
Mixture-model cluster analysis using information theoretical criteria

Intelligent Data Analysis
Generalized competitive learning of Gaussian mixture models

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on cybernetics and cognitive informatics
Gaussian mixture learning via robust competitive agglomeration

Pattern Recognition Letters
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
Nonparametric localized feature selection via a dirichlet process mixture of generalized dirichlet distributions

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Test for homogeneity in gamma mixture models using likelihood ratio

Computational Statistics & Data Analysis
A combined likelihood ratio/information ratio bootstrap technique for estimating the number of components in finite mixtures

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem or assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature.