Modelling high-dimensional data by mixtures of factor analyzers

  • Authors:
  • G. J. McLachlan;D. Peel;R. W. Bean

  • Affiliations:
  • Department of Mathematics, University of Queensland, St. Lucia, Brisbane 4072, Australia;Department of Mathematics, University of Queensland, St. Lucia, Brisbane 4072, Australia;Department of Mathematics, University of Queensland, St. Lucia, Brisbane 4072, Australia

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2003

Quantified Score

Hi-index 0.03

Visualization

Abstract

We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments.