True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction

Authors:
Giorgio Valentini
Affiliations:
Università degli Studi di Milano, Milano
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 0
Cited 11

Robust prediction from multiple heterogeneous data sources with partial information

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A Bayesian integration model for improved gene functional inference from heterogeneous data sources

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
An experimental comparison of hierarchical bayes and true path rule ensembles for protein function prediction

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Introduction to the special issue on learning from multi-label data

Machine Learning
Exploiting label dependency for hierarchical multi-label classification

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Protein function prediction using weak-label learning

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A neural network algorithm for semi-supervised node label learning from unbalanced data

Neural Networks
Hierarchical multi-label classification using local neural networks

Journal of Computer and System Sciences
Protein Function Prediction using Multi-label Ensemble Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gene function prediction is a complex computational problem, characterized by several items: the number of functional classes is large, and a gene may belong to multiple classes; functional classes are structured according to a hierarchy; classes are usually unbalanced, with more negative than positive examples; class labels can be uncertain and the annotations largely incomplete; to improve the predictions, multiple sources of data need to be properly integrated. In this contribution, we focus on the first three items, and, in particular, on the development of a new method for the hierarchical genome-wide and ontology-wide gene function prediction. The proposed algorithm is inspired by the “true path rule” (TPR) that governs both the Gene Ontology and FunCat taxonomies. According to this rule, the proposed TPR ensemble method is characterized by a two-way asymmetric flow of information that traverses the graph-structured ensemble: positive predictions for a node influence in a recursive way its ancestors, while negative predictions influence its offsprings. Cross-validated results with the model organism S. Crevisiae, using seven different sources of biomolecular data, and a theoretical analysis of the the TPR algorithm show the effectiveness and the drawbacks of the proposed approach.