New gene subset selection approaches based on linear separating genes and gene-pairs

  • Authors:
  • Amirali Jafarian;Alioune Ngom;Luis Rueda

  • Affiliations:
  • School of Computer Science, University of Windsor, Windsor, Ontario, Canada;School of Computer Science, University of Windsor, Windsor, Ontario, Canada;School of Computer Science, University of Windsor, Windsor, Ontario, Canada

  • Venue:
  • PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The concept of linear separability of gene expression data sets with respect to two classes has been recently studied in the literature. The problem is to efficiently find all pairs of genes which induce a linear separation of the data. It has been suggested that an underlying molecular mechanism relates together the two genes of a separating pair to the phenotype under study, such as a specific cancer. In this paper we study the Containment Angle (CA) defined on the unit circle for a linearly separating gene-pair (LS-pair) as an alternative to the paired t-test ranking function for gene selection. Using the CA we also show empirically that a given classifier's error is related to the degree of linear separability of a given data set. Finally we propose gene subset selection methods based on the CA ranking function for LS-pairs and a ranking function for linearly separation genes (LS-genes), and which select only among LS-genes and LS-pairs. Our methods give better results in terms of subset sizes and classification accuracy when compared to a well-performing method, on many data sets.