EM cluster analysis for categorical data

  • Authors:
  • Jiří Grim

  • Affiliations:
  • Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague 8, Czech Republic

  • Venue:
  • SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distribution mixtures with product components have been applied repeatedly to determine clusters in multivariate data. Unfortunately, for categorical variables the mixture parameters are not uniquely identifiable and therefore the result of cluster analysis may become questionable. We give a simple proof that any non-degenerate discrete product mixture can be equivalently described by infinitely many different parameter sets. Nevertheless a unique result of cluster analysis can be guaranteed by additional constraints. We propose a heuristic method of sequential estimation of components to guarantee a unique identification of clusters by means of EM algorithm. The application of the method is illustrated by a numerical example.