Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

  • Authors:
  • Wentao Fan;Nizar Bouguila

  • Affiliations:
  • -;-

  • Venue:
  • Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper introduces a novel enhancement for unsupervised feature selection based on generalized Dirichlet (GD) mixture models. Our proposal is based on the extension of the finite mixture model previously developed in [1] to the infinite case, via the consideration of Dirichlet process mixtures, which can be viewed actually as a purely nonparametric model since the number of mixture components can increase as data are introduced. The infinite assumption is used to avoid problems related to model selection (i.e. determination of the number of clusters) and allows simultaneous separation of data in to similar clusters and selection of relevant features. Our resulting model is learned within a principled variational Bayesian framework that we have developed. The experimental results reported for both synthetic data and real-world challenging applications involving image categorization, automatic semantic annotation and retrieval show the ability of our approach to provide accurate models by distinguishing between relevant and irrelevant features without over- or under-fitting the data.