Learning the Threshold in Hierarchical Agglomerative Clustering

Authors:
Kristine Daniels;Christophe Giraud-Carrier
Affiliations:
Brigham Young University, USA;Brigham Young University, USA
Venue:
ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
Year:
2006

Citing 0
Cited 1

Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise.