Adapting k-means algorithm for discovering clusters in subspaces

  • Authors:
  • Yanchang Zhao;Chengqi Zhang;Shichao Zhang;Lianwei Zhao

  • Affiliations:
  • Faculty of Information Technology, University of Technology, Sydney, Australia;Faculty of Information Technology, University of Technology, Sydney, Australia;Faculty of Information Technology, University of Technology, Sydney, Australia;Dept. of Computer Science, Beijing Jiaotong University, Beijing, China

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Subspace clustering is a challenging task in the field of data mining. Traditional distance measures fail to differentiate the furthest point from the nearest point in very high dimensional data space. To tackle the problem, we design minimal subspace distance which measures the similarity between two points in the subspace where they are nearest to each other. It can discover subspace clusters implicitly when measuring the similarities between points. We use the new similarity measure to improve traditional k-means algorithm for discovering clusters in subspaces. By clustering with low-dimensional minimal subspace distance first, the clusters in low-dimensional subspaces are detected. Then by gradually increasing the dimension of minimal subspace distance, the clusters get refined in higher dimensional subspaces. Our experiments on both synthetic data and real data show the effectiveness of the proposed similarity measure and algorithm.