A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

  • Authors:
  • Yunjae Jung;Haesun Park;Ding-Zhu Du;Barry L. Drake

  • Affiliations:
  • Qwest Communications, 600 Stinson Blvd., Minneapolis, MN 55413, USA (e.mail: yunjae@cs.umn.edu);Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455, USA (e-mail: hpark@cs.umn.edu)Korea Institute for Advanced Study 207-43 Cheongryangr ...;Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455, USA;CDT, Inc., Minneapolis, MN 55454, USA (e-mail: bldrake1@yahoo.com)

  • Venue:
  • Journal of Global Optimization
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.