Dimension reduction for model-based clustering

Authors:
Luca Scrucca
Affiliations:
Dipartimento di Economia, Finanza e Statistica, Università degli Studi di Perugia, Perugia, Italy
Venue:
Statistics and Computing
Year:
2010

Citing 6
Cited 7

A Hierarchical Latent Variable Model for Data Visualization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mixtures of probabilistic principal component analyzers

Neural Computation
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modelling high-dimensional data by mixtures of factor analyzers

Computational Statistics & Data Analysis
Parsimonious Gaussian mixture models

Statistics and Computing
Gaussian Regularized Sliced Inverse Regression

Statistics and Computing

Model-based classification via mixtures of multivariate t-distributions

Computational Statistics & Data Analysis
Extending mixtures of multivariate t-factor analyzers

Statistics and Computing
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution

Statistics and Computing
Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions

Statistics and Computing
Clustering and classification via cluster-weighted factor analyzers

Advances in Data Analysis and Classification
Dimension reduction for model-based clustering via mixtures of multivariate $$t$$t-distributions

Advances in Data Analysis and Classification
Model-based clustering of high-dimensional data: A review

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the cluster structure contained in the data. Observations may then be projected onto such a reduced subspace, thus providing summary plots which help to visualize the clustering structure. These plots can be particularly appealing in the case of high-dimensional data and noisy structure. The new constructed variables capture most of the clustering information available in the data, and they can be further reduced to improve clustering performance. We illustrate the approach on both simulated and real data sets.