An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis

Authors:
N. Tajunisha;V. Saravanan
Affiliations:
-;-
Venue:
ICIIC '10 Proceedings of the 2010 First International Conference on Integrated Intelligent Computing
Year:
2010

Citing 0
Cited 2

An efficient unsupervised sample clustering for cancer datasets based on statistical model pre-processing

International Journal of Information Technology and Management
An architecture for component-based design of representative-based clustering algorithms

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many application domains such as information retrieval, computational biology, and image processing the data dimension is usually very high. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. The k-means clustering algorithm is used for many practical applications. But it is computationally expensive and the quality of the resulting clusters heavily depends on the selection of initial centroid and dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimensions of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. So it is required to reduce the dimensionality of the given dataset in order to improve the efficiency and accuracy. This paper proposed a new approach to improve the accuracy of the cluster results by using PCA to determine the initial centroid and also to reduce the dimension of the data.