Model-based subspace clustering of non-Gaussian data

  • Authors:
  • Sabri Boutemedjet;Djemel Ziou;Nizar Bouguila

  • Affiliations:
  • Département d'Informatique, Université de Sherbrooke, Sherbrooke, QC, Canada J1K 2R1;Département d'Informatique, Université de Sherbrooke, Sherbrooke, QC, Canada J1K 2R1;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada H3G 2W1

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a new generalized Dirichlet (GD) mixture model to address the challenging problem of clustering multidimensional data sets on different feature subsets. We approximate class-conditional distributions of mixture components to define binary relevance of features at the level of clusters. We consider a relevant feature as the one providing the knowledge to assign data points in the cluster. Then, we define a new message length objective to learn the model and select both feature subsets and the number of components. The proposed method is general comparatively with existing feature selection and subspace clustering models. In addition, it selects for each cluster only relevant and statistically independent features in a linear time of the number of observations and dimensions. Experiments on synthetic data and in unsupervised image categorization show the merits of our approach.