Adapting k-means algorithm for discovering clusters in subspaces

Authors:
Yanchang Zhao;Chengqi Zhang;Shichao Zhang;Lianwei Zhao
Affiliations:
Faculty of Information Technology, University of Technology, Sydney, Australia;Faculty of Information Technology, University of Technology, Sydney, Australia;Faculty of Information Technology, University of Technology, Sydney, Australia;Dept. of Computer Science, Beijing Jiaotong University, Beijing, China
Venue:
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Year:
2006

Citing 7
Cited 0

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subspace clustering is a challenging task in the field of data mining. Traditional distance measures fail to differentiate the furthest point from the nearest point in very high dimensional data space. To tackle the problem, we design minimal subspace distance which measures the similarity between two points in the subspace where they are nearest to each other. It can discover subspace clusters implicitly when measuring the similarities between points. We use the new similarity measure to improve traditional k-means algorithm for discovering clusters in subspaces. By clustering with low-dimensional minimal subspace distance first, the clusters in low-dimensional subspaces are detected. Then by gradually increasing the dimension of minimal subspace distance, the clusters get refined in higher dimensional subspaces. Our experiments on both synthetic data and real data show the effectiveness of the proposed similarity measure and algorithm.