Hierarchical K-means clustering algorithm based on silhouette and entropy

Authors:
Wuzhou Dong;JiaDong Ren;Dongmei Zhang
Affiliations:
College of Information Science and Engineering, Yanshan University, Qinhuangdao, China;College of Information Science and Engineering, Yanshan University, Qinhuangdao, China;Qinhuangdao Port CO., LTD Qinhuangdao, China
Venue:
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part I
Year:
2011

Citing 7
Cited 0

A robust and efficient clustering algorithm based on cohesion self-merging

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging

IEEE Transactions on Knowledge and Data Engineering
Novel Hybrid Hierarchical-K-means Clustering Method (H-K-means) for Microarray Analysis

CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
Divisive Hierarchical K-Means

CIMCA '06 Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce
Hierarchical initialization approach for K-Means clustering

Pattern Recognition Letters
Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters

IEEE Transactions on Knowledge and Data Engineering
Projective ART with buffers for the high dimensional space clustering and an application to discover stock associations

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical K-means clustering is one of important clustering task in data mining. In order to address the problem that the time complexity of the existing HK algorithms is high and most of algorithms are sensitive to noise, a hierarchical K-means clustering algorithm based on silhouette and entropy(HKSE) is put forward. In HKSE, the optimal cluster number is obtained through calculating the improved silhouette of the dataset to be clustered, so that time complexity can be reduced from O(n2) to O(k × n). Entropy is introduced in the hierarchical clustering phase as the similarity measurement avoiding distance calculation in order to reduce outlier effect on the cluster quality. In the post processing phase, the outlier cluster is identified by computing the weighted distance between clusters. Experiment results show that HKSE is efficient in reducing time complexity and sensitivity to noise.