A heuristic for non-convex variance-based clustering criteria

  • Authors:
  • Rodrigo F. Toso;Casimir A. Kulikowski;Ilya B. Muchnik

  • Affiliations:
  • Department of Computer Science, Rutgers University, Piscataway, NJ;Department of Computer Science, Rutgers University, Piscataway, NJ;DIMACS, Rutgers University, Piscataway, NJ

  • Venue:
  • SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the clustering problem in the context of exploratory data analysis, where data sets are investigated under different and desirably contrasting perspectives. In this scenario where, for flexibility, solutions are evaluated by criterion functions, we introduce and evaluate a generalized and efficient version of the incremental one-by-one clustering algorithm of MacQueen (1967). Unlike the widely adopted two-phase algorithm developed by Lloyd (1957), our approach does not rely on the gradient of the criterion function being optimized, offering the key advantage of being able to deal with non-convex criteria. After an extensive experimental analysis using real-world data sets with a more flexible, non-convex criterion function, we obtained results that are considerably better than those produced with the k-means criterion, making our algorithm an invaluable tool for exploratory clustering applications.