An efficient unsupervised sample clustering for cancer datasets based on statistical model pre-processing

Authors:
N. Tajunisha;V. Saravanan
Affiliations:
Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, Coimbatore, Tamil Nadu, India.;Department of Computer Application, Dr.N.G.P Institute of Technology, Coimbatore, Tamil Nadu, India
Venue:
International Journal of Information Technology and Management
Year:
2012

Citing 7
Cited 0

Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture

Signal Processing
Independent component analysis: algorithms and applications

Neural Networks
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Interrelated Two-way Clustering: An Unsupervised Approach for Gene Expression Data Analysis

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis

ICIIC '10 Proceedings of the 2010 First International Conference on Integrated Intelligent Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

DNA microarray technology can be used to measure expression levels for thousands of genes in a single experiment across different samples. Within a gene expression matrix there are usually several particular macroscopic phenotypes of samples related to some diseases or drug effects such as diseased samples, normal samples or drug treated samples. The goal of sample-based clustering is to find the phenotype structure or substructure of the samples. In this paper, we present a new framework for unsupervised sample-based clustering using informative genes for microarray data. In our work, initial clusters are formed using k-means with fixed initial centroid and then we have used statistical method to find informative genes which are used in turn to obtain an improved clustering. The goal of our clustering approach is to perform better cluster discovery on samples with informative genes. By comparing the results of proposed method with the existing methods, it was found that the results obtained are more accurate in cancer datasets.