Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions

  • Authors:
  • Abu H. M. Kamal;Xingquan Zhu;Ramaswamy Narayanan

  • Affiliations:
  • -;-;-

  • Venue:
  • IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microarray expression data, which contain expression levels of a large number of simultaneously observed genes, have been used in many scientific research and clinical studies. Due to its high dimensionalities, selecting a small number of genes has shown to be beneficial for tasks such as building prediction models for molecular classification of cancers. Traditional gene selection methods, however, fail to take the sample distributions into consideration for gene selection. Due to the scarcity of the samples, in Biomedical research it is very common to have severely biased data distributions with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). Sample sets with biased distributions require special attention for identifying genes responsible for particular disease. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify genes relevant to fatal diseases for biased microarray expression data. Experimental comparisons with the traditional ReliefF method on five microarray datasets demonstrate the effectiveness of the proposed methods in selecting informative genes from microarray expression data with biased sample distributions.