Gene-pair representation and incorporation of GO-based semantic similarity into classification of gene expression data

Authors:
Torsten Schön;Alexey Tsymbal;Martin Huber
Affiliations:
Hochschule Weihenstephan-Triesdorf, Freising, Germany;Corporate Technology Division, Siemens AG, Erlangen, Germany;Corporate Technology Division, Siemens AG, Erlangen, Germany
Venue:
Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Year:
2012

Citing 15
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Induction of Decision Trees

Machine Learning
An Ontology-Driven Clustering Method for Supporting Gene Expression Analysis

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Bioinformatics
Correlation between Gene Expression and GO Semantic Similarity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Incorporating Gene Ontology in Clustering Gene Expression Data

CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Integrating gene ontology into discriminative powers of genes for feature selection in microarray data

Proceedings of the 2007 ACM symposium on Applied computing
Distance Learning for Similarity Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A review of feature selection techniques in bioinformatics

Bioinformatics
Using Gene Ontology to Enhance Effectiveness of Similarity Measures for Microarray Data

BIBM '08 Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
The optimal distance measure for object detection

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
The optimal distance measure for nearest neighbor classification

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, a novel data representation for learning from gene expression data is introduced, which is aimed at emphasizing gene-gene interactions in learning. With this representation, the data simply comprise differences in the expression values of gene pairs and not the expression values themselves. An important benefit of this representation, except the better sensitivity to gene interactions, is the opportunity to incorporate external knowledge in the form of semantic similarity corresponding to the pairs, which is also studied. In this context, two common learning algorithms, plain k-NN classification and Random Forest are compared with two distance function learning-based techniques, learning from equivalence constraints and the intrinsic Random Forest similarity on a set of genetic benchmark datasets. The most discriminative gene pairs are selected and the new representation is evaluated on the benchmark data. The novel representation is shown to increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology GO, the semantic similarity of gene pairs is calculated and used to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the common feature selection and is shown to often increase the classification accuracy.