Towards a Simple Clustering Criterion Based on Minimum Length Encoding

  • Authors:
  • Marcus-Christopher Ludl;Gerhard Widmer

  • Affiliations:
  • -;-

  • Venue:
  • ECML '02 Proceedings of the 13th European Conference on Machine Learning
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example's cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.