K-nearest-neighbor consistency in data clustering: incorporating local information into global optimization

  • Authors:
  • Chris Ding;Xiaofeng He

  • Affiliations:
  • Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA

  • Venue:
  • Proceedings of the 2004 ACM symposium on Applied computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.02

Visualization

Abstract

Nearest neighbor consistency is a central concept in statistical pattern recognition, especially the kNN classification methods and its strong theoretical foundation. In this paper, we extend this concept to data clustering, requiring that for any data point in a cluster, its k-nearest neighbors and mutual nearest neighbors should also be in the same cluster. We study properties of the cluster k-nearest neighbor consistency and propose kNN and kMN consistency enforcing and improving algorithms. Extensive experiments on internet newsgroup datasets using the K-means clustering algorithm with kNN consistency enhancement show that kNN / kMN consistency can be improved significantly (about 100% for 1MN and 1NN consistencies) while the clustering accuracy is improved simultaneously. This indicates the local consistency information helps the global cluster objective function optimization.