Comparing early and late data fusion methods for gene function prediction

Authors:
Matteo Re;Giorgio Valentini
Affiliations:
DSI, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Via Comelico 39, Italy;DSI, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Via Comelico 39, Italy
Venue:
Proceedings of the 2009 conference on Neural Nets WIRN09: Proceedings of the 19th Italian Workshop on Neural Nets, Vietri sul Mare, Salerno, Italy, May 28--30 2009
Year:
2009

Citing 3
Cited 0

Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
A statistical framework for genomic data fusion

Bioinformatics
A note on Platt's probabilistic outputs for support vector machines

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput biotechnologies are playing an increasingly important role in biomolecular research. Their ability to provide genome wide views of molecular mechanisms occurring in living cells could play a crucial role in the elucidation of biomolecular processes at system level but dataset produced using these techniques are often high-dimensional and very noisy making their analysis challenging because the need to extract relevant information froma sea of noise. Gene function prediction is a central problem in modern bioinformatics and recent works pointed out that gene function prediction performances can be improved by integrating heterogeneous biomolecular datasources. In this contribution we compared performances achievable in gene function prediction by early and late data fusion methods. Given that, among the available late fusion methods, ensemble systems have not been, at today, extensively investigated, all the late fusion experiments were performed using multiple classifier systems. Experimental results show that late fusion of heterogeneous datasets realized by mean of ensemble systems outperformed both early fusion approaches and base learners trained on single types of biomolecular data.