Data Driven Similarity Measures for k-Means Like Clustering Algorithms

  • Authors:
  • Jacob Kogan;Marc Teboulle;Charles Nicholas

  • Affiliations:
  • Department of Mathematics and Statistics, UMBC, Baltimore 21250;School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, Israel;Department of Computer Science and Electrical Engineering, UMBC, Baltimore 21250

  • Venue:
  • Information Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an optimization approach that generates k-means like clustering algorithms. The batch k-means and the incremental k-means are two well known versions of the classical k-means clustering algorithm (Duda et al. 2000). To benefit from the speed of the batch version and the accuracy of the incremental version we combine the two in a "ping--pong" fashion. We use a distance-like function that combines the squared Euclidean distance with relative entropy. In the extreme cases our algorithm recovers the classical k-means clustering algorithm and generalizes the Divisive Information Theoretic clustering algorithm recently reported independently by Berkhin and Becher (2002) and Dhillon1 et al. (2002). Results of numerical experiments that demonstrate the viability of our approach are reported.