A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Authors:
Yunjae Jung;Haesun Park;Ding-Zhu Du;Barry L. Drake
Affiliations:
Qwest Communications, 600 Stinson Blvd., Minneapolis, MN 55413, USA (e.mail: yunjae@cs.umn.edu);Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455, USA (e-mail: hpark@cs.umn.edu)Korea Institute for Advanced Study 207-43 Cheongryangr ...;Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455, USA;CDT, Inc., Minneapolis, MN 55454, USA (e-mail: bldrake1@yahoo.com)
Venue:
Journal of Global Optimization
Year:
2003

Citing 19
Cited 10

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
How many clusters are best?—an experiment

Pattern Recognition
Algorithms for clustering data

Algorithms for clustering data
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Parallel Algorithms for Hierarchical Clustering and Cluster Validity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering algorithms

Information retrieval
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Pattern recognition and image analysis

Pattern recognition and image analysis
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Scaling mining algorithms to large databases

Communications of the ACM - Evolving data mining into solutions for insights
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem

Journal of Global Optimization
A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning

Journal of Global Optimization
2008 Special Issue: Interactive data analysis and clustering of genomic data

Neural Networks
Automatic detection of cohesive subgroups within social hypertext: A heuristic approach

The New Review of Hypermedia and Multimedia
Identification of association rules between clusters

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index

Pattern Recognition
Application of K-Medoids with Kd-Tree for Software Fault Prediction

ACM SIGSOFT Software Engineering Notes
An efficient algorithm for maximal margin clustering

Journal of Global Optimization
Fast rank-2 nonnegative matrix factorization for hierarchical document clustering

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Timeline generation: tracking individuals on twitter

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.