Using machine learning techniques and genomic/proteomic information from known databases for defining relevant features for PPI classification

Authors:
J. M. Urquiza;I. Rojas;H. Pomares;J. Herrera;J. P. Florido;O. Valenzuela;M. Cepero
Affiliations:
Department of Computer Architecture and Computer Technology, Spain;Department of Computer Architecture and Computer Technology, Spain;Department of Computer Architecture and Computer Technology, Spain;Department of Computer Architecture and Computer Technology, Spain;Department of Computer Architecture and Computer Technology, Spain;Department of Applied Mathematics, University of Granada, 18017 Granada, Spain;Department of Applied Mathematics, University of Granada, 18017 Granada, Spain
Venue:
Computers in Biology and Medicine
Year:
2012

Citing 17
Cited 1

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Support-Vector Networks

Machine Learning
Analysis of the Functional Block Involved in the Design of Radial Basis Function Networks

Neural Processing Letters
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Probability Estimates for Multi-class Classification by Pairwise Coupling

The Journal of Machine Learning Research
Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction

GI '05 Proceedings of Graphics Interface 2005
Probabilistic inference of molecular networks from noisy data sources

Bioinformatics
Kernel methods for predicting protein--protein interactions

Bioinformatics
Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Predictive Integration of Gene Ontology-Driven Similarity and Functional Interactions

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Recursive prediction for long term time series forecasting using advanced models

Neurocomputing
An assessment of the uses of homologous interactions

Bioinformatics
Combining multiple positive training sets to generate confidence scores for protein–protein interactions

Bioinformatics
A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks

Computers in Biology and Medicine
Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction

Computers in Biology and Medicine
Simple sequence-based kernels do not predict protein–protein interactions

Bioinformatics
Integration of Genomic Data for Inferring Protein Complexes from Global Protein–Protein Interaction Networks

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In modern proteomics, prediction of protein-protein interactions (PPIs) is a key research line, as these interactions take part in most essential biological processes. In this paper, a new approach is proposed to PPI data classification based on the extraction of genomic and proteomic information from well-known databases and the incorporation of semantic measures. This approach is carried out through the application of data mining techniques and provides very accurate models with high levels of sensitivity and specificity in the classification of PPIs. The well-known support vector machine paradigm is used to learn the models, which will also return a new confidence score which may help expert researchers to filter out and validate new external PPIs. One of the most-widely analyzed organisms, yeast, will be studied. We processed a very high-confidence dataset by extracting up to 26 specific features obtained from the chosen databases, half of them calculated using two new similarity measures proposed in this paper. Then, by applying a filter-wrapper algorithm for feature selection, we obtained a final set composed of the eight most relevant features for predicting PPIs, which was validated by a ROC analysis. The prediction capability of the support vector machine model using these eight features was tested through the evaluation of the predictions obtained in a set of external experimental, computational, and literature-collected datasets.