Model-based clustering of high-dimensional data: A review

  • Authors:
  • Charles Bouveyron;Camille Brunet-Saumard

  • Affiliations:
  • Laboratoire SAMM, EA 4543, Université Paris 1 Panthéon-Sorbonne, France;Laboratoire LAREMA, UMR CNRS 6093, Université d'Angers, France

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2014

Quantified Score

Hi-index 0.03

Visualization

Abstract

Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-parametrized in this case. However, high-dimensional spaces have specific characteristics which are useful for clustering and recent techniques exploit those characteristics. After having recalled the bases of model-based clustering, dimension reduction approaches, regularization-based techniques, parsimonious modeling, subspace clustering methods and clustering methods based on variable selection are reviewed. Existing softwares for model-based clustering of high-dimensional data will be also reviewed and their practical use will be illustrated on real-world data sets.