Combining One-Class Classification Models Based on Diverse Biological Data for Prediction of Protein-Protein Interactions

Authors:
José A. Reyes;David Gilbert
Affiliations:
Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow, UK G12 8QQ and Facultad de Ingeniería, Universidad de Talca, Chile;Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow, UK G12 8QQ
Venue:
DILS '08 Proceedings of the 5th international workshop on Data Integration in the Life Sciences
Year:
2008

Citing 15
Cited 0

The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Support Vector Data Description

Machine Learning
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Kernel methods for predicting protein--protein interactions

Bioinformatics
The SSEA server for protein secondary structure alignment

Bioinformatics
An analysis of diversity measures

Machine Learning
A machine learning information retrieval approach to protein fold recognition

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Secondary structure based analysis and classification of biological interfaces

Bioinformatics
Edge-based scoring and searching method for identifying condition-responsive protein–protein interaction sub-network

Bioinformatics
Interaction-site prediction for protein complexes

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse biological data. Gold Standard data sets frequently employed for this task contain a high proportion of instances related to ribosomal proteins. We demonstrate that this situation biases the classification results and additionally that the prediction of non-ribosomal based PPI is a much more difficult task. In order to improve the performance of this subtask we have integrated more biological data into the classification process, including data from mRNA expression experiments and protein secondary structure information. Furthermore we have investigated several strategies for combining diverse one-class classification (OCC) models generated from different subsets of biological data. The weighted average combination approach exhibits the best results, significantly improving the performance attained by any single classification model evaluated.