Neighborhood density method for selecting initial cluster centers in k-means clustering

Authors:
Yunming Ye;Joshua Zhexue Huang;Xiaojun Chen;Shuigeng Zhou;Graham Williams;Xiaofei Xu
Affiliations:
Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;E-Business Technology Institute, University of Hong Kong, Hong Kong;Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Department of Computer Science and Engineering, Fudan University, Shanghai, China;Australian Taxation Office, Australia;Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 5
Cited 2

Data clustering: a review

ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An experimental comparison of several clustering and initialization methods

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
A neighborhood-based clustering algorithm

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

A subspace decision cluster classifier for text classification

Expert Systems with Applications: An International Journal
Flock by leader: a novel machine learning biologically inspired clustering algorithm

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new method for effectively selecting initial cluster centers in k-means clustering. This method identifies the high density neighborhoods from the data first and then selects the central points of the neighborhoods as initial centers. The recently published Neighborhood-Based Clustering (NBC) algorithm is used to search for high density neighborhoods. The new clustering algorithm NK-means integrates NBC into the k-means clustering process to improve the performance of the k-means algorithm while preserving the k-means efficiency. NBC is enhanced with a new cell-based neighborhood search method to accelerate the search for initial cluster centers. A merging method is employed to filter out insignificant initial centers to avoid too many clusters being generated. Experimental results on synthetic data sets have shown significant improvements in clustering accuracy in comparison with the random k-means and the refinement k-means algorithms.