Gene-pair representation and incorporation of GO-based semantic similarity into classification of gene expression data

  • Authors:
  • Torsten Schön;Alexey Tsymbal;Martin Huber

  • Affiliations:
  • Hochschule Weihenstephan-Triesdorf, Freising, Germany;Corporate Technology Division, Siemens AG, Erlangen, Germany;Corporate Technology Division, Siemens AG, Erlangen, Germany

  • Venue:
  • Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, a novel data representation for learning from gene expression data is introduced, which is aimed at emphasizing gene-gene interactions in learning. With this representation, the data simply comprise differences in the expression values of gene pairs and not the expression values themselves. An important benefit of this representation, except the better sensitivity to gene interactions, is the opportunity to incorporate external knowledge in the form of semantic similarity corresponding to the pairs, which is also studied. In this context, two common learning algorithms, plain k-NN classification and Random Forest are compared with two distance function learning-based techniques, learning from equivalence constraints and the intrinsic Random Forest similarity on a set of genetic benchmark datasets. The most discriminative gene pairs are selected and the new representation is evaluated on the benchmark data. The novel representation is shown to increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology GO, the semantic similarity of gene pairs is calculated and used to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the common feature selection and is shown to often increase the classification accuracy.