Minimum Information Loss Cluster Analysis for Categorical Data

  • Authors:
  • Jiří Grim;Jan Hora

  • Affiliations:
  • Institute of Information Theory and Automation, of the Czech Academy of Sciences, P.O. BOX 18, 18208 Prague 8, Czech Republic;Faculty of Nuclear Science and Physical Engineering, Czech Technical University, Trojanova 13, CZ-120 00 Prague 2, Czech Republic

  • Venue:
  • MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identifiable and, moreover, the estimated mixture parameters are starting-point dependent. For this reason we use the latent class model only to define a set of "elementary" classes by estimating a mixture of a large number components. We propose a hierarchical "bottom up" cluster analysis based on unifying the elementary latent classes sequentially. The clustering procedure is controlled by minimum information loss criterion.