Relevant gene selection using normalized cut clustering with maximal compression similarity measure

Authors:
Rajni Bala;R. K. Agrawal;Manju Sardana
Affiliations:
Deen Dayal Upadhyaya College, University of Delhi, Delhi, India;School of Computer and System Science, Jawaharlal Nehru University, New Delhi, India;School of Computer and System Science, Jawaharlal Nehru University, New Delhi, India
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Year:
2010

Citing 8
Cited 0

Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Self-Organizing Maps

Self-Organizing Maps
An introduction to variable and feature selection

The Journal of Machine Learning Research
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Toward Robust Distance Metric Analysis for Similarity Estimation

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
A novel ensemble of classifiers for microarray data classification

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microarray cancer classification has drawn attention of research community for better clinical diagnosis in last few years Microarray datasets are characterized by high dimension and small sample size To avoid curse of dimensionality good feature selection methods are needed Here, we propose a two stage algorithm for finding a small subset of relevant genes responsible for classification in high dimensional microarray datasets In first stage of algorithm, the entire feature space is divided into k clusters using normalized cut Similarity measure used for clustering is maximal information compression index The informative gene is selected from each cluster using t-statistics and a pool of non redundant genes is created In second stage a wrapper based forward feature selection method is used to obtain a set of optimal genes for a given classifier The proposed algorithm is tested on three well known datasets from Kent Ridge Biomedical Data Repository Comparison with other state of art methods shows that our proposed algorithm is able to achieve better classification accuracy with less number of features.