Learning the Threshold in Hierarchical Agglomerative Clustering

  • Authors:
  • Kristine Daniels;Christophe Giraud-Carrier

  • Affiliations:
  • Brigham Young University, USA;Brigham Young University, USA

  • Venue:
  • ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise.