A data partitioning approach for hierarchical clustering

Authors:
Seok-Ho Yoon;Suk-Soon Song;Sang-Chul Lee;Kyo-Sung Jeong;Sang-Wook Kim;Sooyong Kang;Yong Suk Choi;Jaehyuk Cha;Minsoo Ryu;Byung-Soo Jeong
Affiliations:
Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Kyung Hee University, Seoul, Korea
Venue:
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Year:
2013

Citing 9
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Diva: a variance-based clustering approach for multi-type relational data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a parameter-insensitive data partitioning approach for Chameleon, a hierarchical clustering algorithm. The proposed method splits a given dataset into every possible number of clusters by using existing algorithms that do allow arbitrary-sized sub-clusters in partitioning. After that, it evaluates the quality of every set of initial sub-clusters by using our measurement function, and decides the optimal set of initial sub-clusters such that they show the highest value of measurement. Finally, it merges these optimal initial sub-clusters repeatedly and produces the final clustering result. We perform extensive experiments, and the results show that the proposed approach is insensitive to parameters and also produces a set of final clusters whose quality is better than the previous one.