Automatic extraction of clusters from hierarchical clustering representations

  • Authors:
  • Jörg Sander;Xuejie Qin;Zhiyong Lu;Nan Niu;Alex Kovarsky

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical clustering algorithms are typically more effective in detecting the true clustering structure of a data set than partitioning algorithms. However, hierarchical clustering algorithms do not actually create clusters, but compute only a hierarchical representation of the data set. This makes them unsuitable as an automatic pre-processing step for other algorithms that operate on detected clusters. This is true for both dendrograms and reachability plots, which have been proposed as hierarchical clustering representations, and which have different advantages and disadvantages. In this paper we first investigate the relation between dendrograms and reachability plots and introduce methods to convert them into each other showing that they essentially contain the same information. Based on reachability plots, we then introduce a technique that automatically determines the significant clusters in a hierarchical cluster representation. This makes it for the first time possible to use hierarchical clustering as an automatic pre-processing step that requires no user interaction to select clusters from a hierarchical cluster representation.