Automatic extraction of clusters from hierarchical clustering representations

Authors:
Jörg Sander;Xuejie Qin;Zhiyong Lu;Nan Niu;Alex Kovarsky
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Department of Computing Science, University of Alberta, Edmonton, AB, Canada
Venue:
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2003

Citing 5
Cited 14

Algorithms for clustering data

Algorithms for clustering data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Incremental and effective data summarization for dynamic hierarchical clustering

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A methodology for analyzing SAGE libraries for cancer profiling

ACM Transactions on Information Systems (TOIS)
Online Hierarchical Clustering in a Data Warehouse Environment

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Multi-step density-based clustering

Knowledge and Information Systems
HISSCLU: a hierarchical density-based method for semi-supervised clustering

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
User Oriented Hierarchical Information Organization and Retrieval

ECML '07 Proceedings of the 18th European conference on Machine Learning
Automatic Cluster Selection Using Index Driven Search Strategy

AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
An AI tool for the petroleum industry based on image analysis and hierarchical clustering

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
DBStrata: a system for density-based clustering and outlier detection based on stratification

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Dynamic incremental data summarization for hierarchical clustering

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
A Simpler and More Accurate AUTO-HDS Framework for Clustering and Visualization of Biological Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Enhancing density-based clustering: Parameter reduction and outlier detection

Information Systems
Expert system for clustering prokaryotic species by their metabolic features

Expert Systems with Applications: An International Journal
Duration discretisation for activity recognition

Technology and Health Care

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical clustering algorithms are typically more effective in detecting the true clustering structure of a data set than partitioning algorithms. However, hierarchical clustering algorithms do not actually create clusters, but compute only a hierarchical representation of the data set. This makes them unsuitable as an automatic pre-processing step for other algorithms that operate on detected clusters. This is true for both dendrograms and reachability plots, which have been proposed as hierarchical clustering representations, and which have different advantages and disadvantages. In this paper we first investigate the relation between dendrograms and reachability plots and introduce methods to convert them into each other showing that they essentially contain the same information. Based on reachability plots, we then introduce a technique that automatically determines the significant clusters in a hierarchical cluster representation. This makes it for the first time possible to use hierarchical clustering as an automatic pre-processing step that requires no user interaction to select clusters from a hierarchical cluster representation.