ACM Computing Surveys (CSUR)
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Rough sets perspective on data and knowledge
Handbook of data mining and knowledge discovery
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Effective initialization of k-means for color quantization
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Hi-index | 0.00 |
In high dimensional data, general performance of the traditional clustering algorithms decreases. As some dimensions are likely to be irrelevant or contain noisy data and randomly selected initial centre of the clusters converge the clustering to local minima. In this paper, we propose a framework for clustering high dimensional data with attribute subset selection and efficient cluster centre initialization. It uses rough set theory to determine the relevant attributes (dimensions) in first phase. In second phase, maximum variance dimension is used to determine the optimal initial centres of the clusters. The k-means clustering algorithm is applied with these initial cluster centres, in phase three, to find optimal clustering of data set. It improves efficiency of the clustering process tremendously and our experiment on test data set shows that accuracy of the results has improved considerably.