Reduced False Positives in PDZ Binding Prediction Using Sequence and Structural Descriptors

Authors:
John C. Hawkins;Hongbo Zhu;Joan Teyra;M. Teresa Pisabarro
Affiliations:
BIOTEC TU Dresden, Dresden;BIOTEC TU Dresden, Dresden;BIOTEC TU Dresden, Dresden;BIOTEC TU Dresden, Dresden
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 10
Cited 0

Signal detection theory: valuable tools for evaluating inductive learning

Proceedings of the sixth international workshop on Machine learning
Fast and robust computation of molecular surfaces

Proceedings of the eleventh annual symposium on Computational geometry
PDZBase: a protein--protein interaction database for PDZ-domains

Bioinformatics
Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons

Bioinformatics
A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity

Bioinformatics
Clustal W and Clustal X version 2.0

Bioinformatics
Moment invariants as shape recognition technique for comparing protein binding sites

Bioinformatics
Domain Interaction Footprint

Bioinformatics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain–peptide interaction from primary sequence

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying the binding partners of proteins is a problem of fundamental importance in computational biology. The PDZ is one of the most common and well-studied protein binding domains, hence it is a perfect model system for designing protein binding predictors. The standard approach to identifying the binding partners of PDZ domains uses multiple sequence alignments to infer the set of contact residues that are used in a predictive model. We expand on the sequence alignment approach by incorporating structural information to generate descriptors of the binding site geometry. Furthermore, we generate a real-value score for binary predictions by applying a filter based on models that predict the probability distributions of contact residues at each of the canonical PDZ ligand binding positions. Under training cross validation, our model produced an order of magnitude more predictions at a false positive proportion (FPP) of 10 percent than our benchmark model chosen from the literature. Evaluated using an independent cross validation, with computationally predicted structures, our model was able to make five times as many predictions as the benchmark model, with a Matthews' correlation coefficient (MCC) of 0.33. In addition, our model achieved a false positive proportion of 0.14, while the benchmark model had a 0.25 false positive proportion.