Identifying informative genes for prediction of breast cancer subtypes

  • Authors:
  • Iman Rezaeian;Yifeng Li;Martin Crozier;Eran Andrechek;Alioune Ngom;Luis Rueda;Lisa Porter

  • Affiliations:
  • School of Computer Science, University of Windsor, Windsor, Ontario, Canada;School of Computer Science, University of Windsor, Windsor, Ontario, Canada;Department of Biological Sciences, University of Windsor, Windsor, Ontario, Canada;Department of Physiology, Michigan State University, East Lansing, MI, United States;School of Computer Science, University of Windsor, Windsor, Ontario, Canada;School of Computer Science, University of Windsor, Windsor, Ontario, Canada;Department of Physiology, Michigan State University, East Lansing, MI, United States

  • Venue:
  • PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is known that breast cancer is not just one disease, but rather a collection of many different diseases occurring in one site that can be distinguished based in part on characteristic gene expression signatures. Appropriate diagnosis of the specific subtypes of this disease is critical for ensuring the best possible patient response to therapy. Currently, therapeutic direction is determined based on the expression of characteristic receptors; while cost effective, this method is not robust and is limited to predicting a small number of subtypes reliably. Using the original 5 subtypes of breast cancer we hypothesized that machine learning techniques would offer many benefits for feature selection. Unlike existing gene selection approaches, we propose a tree-based approach that conducts gene selection and builds the classifier simultaneously. We conducted computational experiments to select the minimal number of genes that would reliably predict a given subtype. Our results support that this modified approach to gene selection yields a small subset of genes that can predict subtypes with greater than 95% overall accuracy. In addition to providing a valuable list of targets for diagnostic purposes, the gene ontologies of selected genes suggest that these methods have isolated a number of potential genes involved in breast cancer biology, etiology and potentially novel therapeutics.