Towards a Simple Clustering Criterion Based on Minimum Length Encoding

Authors:
Marcus-Christopher Ludl;Gerhard Widmer
Affiliations:
-;-
Venue:
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Year:
2002

Citing 6
Cited 1

Clustering techniques for large data sets—from the past to the future

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Relative Unsupervised Discretization for Association Rule Mining

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Segmentation Using Eigenvectors: A Unifying View

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
A Framework for Experimental Evaluation of Clustering Techniques

IWPC '00 Proceedings of the 8th International Workshop on Program Comprehension

In Search of the Horowitz Factor: Interim Report on a Musical Discovery Project

DS '02 Proceedings of the 5th International Conference on Discovery Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example's cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.