Neighborhood density method for selecting initial cluster centers in k-means clustering

  • Authors:
  • Yunming Ye;Joshua Zhexue Huang;Xiaojun Chen;Shuigeng Zhou;Graham Williams;Xiaofei Xu

  • Affiliations:
  • Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;E-Business Technology Institute, University of Hong Kong, Hong Kong;Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Department of Computer Science and Engineering, Fudan University, Shanghai, China;Australian Taxation Office, Australia;Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new method for effectively selecting initial cluster centers in k-means clustering. This method identifies the high density neighborhoods from the data first and then selects the central points of the neighborhoods as initial centers. The recently published Neighborhood-Based Clustering (NBC) algorithm is used to search for high density neighborhoods. The new clustering algorithm NK-means integrates NBC into the k-means clustering process to improve the performance of the k-means algorithm while preserving the k-means efficiency. NBC is enhanced with a new cell-based neighborhood search method to accelerate the search for initial cluster centers. A merging method is employed to filter out insignificant initial centers to avoid too many clusters being generated. Experimental results on synthetic data sets have shown significant improvements in clustering accuracy in comparison with the random k-means and the refinement k-means algorithms.