On multivariate binary data clustering and feature weighting

  • Authors:
  • Nizar Bouguila

  • Affiliations:
  • Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, Qc, Canada H3G 2W1

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.04

Visualization

Abstract

This paper presents an approach that partitions data sets of unlabeled binary vectors without a priori information about the number of clusters or the saliency of the features. The unsupervised binary feature selection problem is approached using finite mixture models of multivariate Bernoulli distributions. Using stochastic complexity, the proposed model determines simultaneously the number of clusters in a given data set composed of binary vectors and the saliency of the features used. We conduct different applications involving real data, document classification and images categorization to show the merits of the proposed approach.