Reduct and variance based clustering of high dimensional dataset

  • Authors:
  • Dharmveer Singh Rajput;P. K. Singh;M. Bhattacharya

  • Affiliations:
  • ABV --- Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India;ABV --- Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India;ABV --- Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India

  • Venue:
  • ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In high dimensional data, general performance of the traditional clustering algorithms decreases. As some dimensions are likely to be irrelevant or contain noisy data and randomly selected initial centre of the clusters converge the clustering to local minima. In this paper, we propose a framework for clustering high dimensional data with attribute subset selection and efficient cluster centre initialization. It uses rough set theory to determine the relevant attributes (dimensions) in first phase. In second phase, maximum variance dimension is used to determine the optimal initial centres of the clusters. The k-means clustering algorithm is applied with these initial cluster centres, in phase three, to find optimal clustering of data set. It improves efficiency of the clustering process tremendously and our experiment on test data set shows that accuracy of the results has improved considerably.