Minimum description length and clustering with exemplars

  • Authors:
  • Po-Hsiang Lai;Joseph A. O'Sullivan;Robert Pless

  • Affiliations:
  • Department of Electrical and Systems Engineering, Washington University in St Louis, St Louis, MO;Department of Electrical and Systems Engineering, Washington University in St Louis, St Louis, MO;Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO

  • Venue:
  • ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 2
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.