Handling missing features with boosting algorithms for protein-protein interaction prediction

  • Authors:
  • Fabrizio Smeraldi;Michael Defoin-Platel;Mansoor Saqi

  • Affiliations:
  • School of Electronic Engineering and Computer Science, Queen Mary University of London, London;Biomathematics and Bioinformatics, Rothamsted Research, Harpenden;Biomathematics and Bioinformatics, Rothamsted Research, Harpenden

  • Venue:
  • DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Combining information from multiple heterogeneous data sources can aid prediction of protein-protein interaction. This information can be arranged into a feature vector for classification. However, missing values in the data can impact on the prediction accuracy. Boosting has emerged as a powerful tool for feature selection and classification. Bayesian methods have traditionally been used to cope with missing data, with boosting being applied to the output of Bayesian classifiers. We explore a variation of Adaboost that deals with the missing values at the level of the boosting algorithm itself, without the need for any density estimation step. Experiments on a publicly available PPI dataset suggest this overall simpler and mathematically coherent approach may be more accurate.