Identification of multiple gene subsets using multi-objective evolutionary algorithms

  • Authors:
  • A. Raji Reddy;Kalyanmoy Deb

  • Affiliations:
  • Kanpur Genetic Algorithms Laboratory, Indian Institute of Technology Kanpur, Kanpur, India;Kanpur Genetic Algorithms Laboratory, Indian Institute of Technology Kanpur, Kanpur, India

  • Venue:
  • EMO'03 Proceedings of the 2nd international conference on Evolutionary multi-criterion optimization
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the area of bioinformatics, the identification of gene subsets responsible for classifying available samples to two or more classes (for example, classes being 'malignant' or 'benign') is an important task. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, we treat the problem as a multi-objective optimization problem of minimizing the gene subset size and simultaneous minimizing the number of misclassified samples. Contrary to the past studies, we have discovered that a small gene subset size (such as four or five) is enough to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). Besides a few variants of NSGA-II, in one implementation NSGA-II is modified to find multi-modal non-dominated solutions discovering as many as 630 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to perform the identification task with more confidence, we have also introduced a threshold in the prediction strength. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multi-objective EA for the gene identification task.