Cluster analysis of high-dimensional data: a case study

Authors:
Richard Bean;Geoff McLachlan
Affiliations:
ARC Centre in Bioinformatics, Institute for Molecular Bioscience, UQ;ARC Centre in Bioinformatics, Institute for Molecular Bioscience, UQ
Venue:
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Year:
2005

Citing 3
Cited 0

Matrices for statistics

Matrices for statistics
Modelling high-dimensional data by mixtures of factor analyzers

Computational Statistics & Data Analysis
Mixtures of Factor Analyzers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.