Letters: Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines

Authors:
Matteo Re;Giorgio Valentini
Affiliations:
Dipartimento di Scienze dell'Informazione, Universití degli Studi di Milano, Via Comelico 39, Italy;Dipartimento di Scienze dell'Informazione, Universití degli Studi di Milano, Via Comelico 39, Italy
Venue:
Neurocomputing
Year:
2010

Citing 3
Cited 4

Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
A statistical framework for genomic data fusion

Bioinformatics
A note on Platt's probabilistic outputs for support vector machines

Machine Learning

The use of artificial-intelligence-based ensembles for intrusion detection: a review

Applied Computational Intelligence and Soft Computing
An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks

Neurocomputing
A Lattice-Computing ensemble for reasoning based on formal fusion of disparate data types, and an industrial dispensing application

Information Fusion
Learning by abstraction: Hierarchical classification model using evidential theoretic approach and Bayesian ensemble model

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Several solutions have been proposed to exploit the availability of heterogeneous sources of biomolecular data for gene function prediction, but few attention has been dedicated to the evaluation of the potential improvement in functional classification results that could be achieved through data fusion realized by means of ensemble-based techniques. In this contribution we test the performance of several ensembles of support vector machine (SVM) classifiers, in which each component learner has been trained on different types of bio-molecular data, and then combined to obtain a consensus prediction using different aggregation techniques. Experimental results using data obtained with different high-throughput biotechnologies show that simple ensemble methods outperform both learning machines trained on single homogeneous types of bio-molecular data, and vector space integration methods.