MML-Based Approach for High-Dimensional Unsupervised Learning Using the Generalized Dirichlet Mixture

  • Authors:
  • Nizar Bouguila;Djemel Ziou

  • Affiliations:
  • Universite de Sherbrooke;Universite de Sherbrooke

  • Venue:
  • CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of determining the structure of high-dimensional data, without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. In addition, the mathematical properties of this distribution allow highdimensional modeling without requiring dimensionality reduction and thus without a loss of information. The number of clusters is determined using the Minimum Message length (MML) principle. Parameters estimation is done by a hybrid stochastic expectation-maximization (HSEM) algorithm. The model is compared with results obtained by other selection criteria (AIC, MDL and MMDL). The performance of our method is tested by real data clustering and by applying it to an image object recognition problem.