Relevant gene selection using normalized cut clustering with maximal compression similarity measure

  • Authors:
  • Rajni Bala;R. K. Agrawal;Manju Sardana

  • Affiliations:
  • Deen Dayal Upadhyaya College, University of Delhi, Delhi, India;School of Computer and System Science, Jawaharlal Nehru University, New Delhi, India;School of Computer and System Science, Jawaharlal Nehru University, New Delhi, India

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microarray cancer classification has drawn attention of research community for better clinical diagnosis in last few years Microarray datasets are characterized by high dimension and small sample size To avoid curse of dimensionality good feature selection methods are needed Here, we propose a two stage algorithm for finding a small subset of relevant genes responsible for classification in high dimensional microarray datasets In first stage of algorithm, the entire feature space is divided into k clusters using normalized cut Similarity measure used for clustering is maximal information compression index The informative gene is selected from each cluster using t-statistics and a pool of non redundant genes is created In second stage a wrapper based forward feature selection method is used to obtain a set of optimal genes for a given classifier The proposed algorithm is tested on three well known datasets from Kent Ridge Biomedical Data Repository Comparison with other state of art methods shows that our proposed algorithm is able to achieve better classification accuracy with less number of features.