Protein loop closure using orientational restraints from NMR data

  • Authors:
  • Chittaranjan Tripathy;Jianyang Zeng;Pei Zhou;Bruce Randall Donald

  • Affiliations:
  • Department of Computer Science, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC;Department of Biochemistry, Duke University Medical Center, Durham, NC;Department of Computer Science, Duke University, Durham, NC and Department of Biochemistry, Duke University Medical Center, Durham, NC

  • Venue:
  • RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protein loops often play important roles in biological functions such as binding, recognition, catalytic activities and allosteric regulation. Modeling loops that are biophysically sensible is crucial to determining the functional specificity of a protein. A variety of algorithms ranging from robotics-inspired inverse kinematics methods to fragmentbased homology modeling techniques have been developed to predict protein loops. However, determining the 3D structures of loops using global orientational restraints on internuclear vectors, such as those obtained from residual dipolar coupling (RDC) data in solution Nuclear Magnetic Resonance (NMR) spectroscopy, has not been well studied. In this paper, we present a novel algorithm that determines the protein loop conformations using a minimal amount of RDC data. Our algorithm exploits the interplay between the sphero-conics derived from RDCs and the protein kinematics, and formulates the loop structure determination problem as a system of low-degree polynomial equations that can be solved exactly and in closed form. The roots of these polynomial equations, which encode the candidate conformations, are searched systematically, using efficient and provable pruning strategies that triage the vast majority of conformations, to enumerate or prune all possible loop conformations consistent with the data. Our algorithm guarantees completeness by ensuring that a possible loop conformation consistent with the data is never missed. This data-driven algorithm provides a way to assess the structural quality from experimental data with minimal modeling assumptions. We applied our algorithm to compute the loops of human ubiquitin, the FF Domain 2 of human transcription elongation factor CA150 (FF2), the DNA damage inducible protein I (DinI) and the third IgG-binding domain of Protein G (GB3) from experimental RDC data. A comparison of our results versus those obtained by using traditional structure determination protocols on the same data shows that our algorithm is able to achieve higher accuracy: a 3- to 6-fold improvement in backbone RMSD. In addition, computational experiments on synthetic RDC data for a set of protein loops of length 4, 8 and 12 used in previous studies show that, whenever sparse RDCs can be measured, our algorithm can compute longer loops with high accuracy. These results demonstrate that our algorithm can be successfully applied to compute loops with high accuracy from a limited amount of NMR data. Our algorithm will be useful to determine high-quality complete protein backbone conformations, which will benefit the nuclear Overhauser effect (NOE) assignment process in high-resolution protein structure determination.