A random graph approach to NMR sequential assignment

  • Authors:
  • Chris Bailey-Kellogg;Sheetal Chainraj;Gopal Pandurangan

  • Affiliations:
  • Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN

  • Venue:
  • RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics, and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g. through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type, and different algorithms have combined the information in two phases (find short unambiguous strings then align) or simultaneously (align while extending strings). This paper focuses on the information content available in connectivity alone, allowing for ambiguity rather than handling only unambiguous strings, and complements existing work on the information content in amino acid type.In this paper, we develop a novel random-graph theoretic framework for algorithmic analysis of NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy (a key source of connectivity ambiguity). We then give a simple and natural randomized algorithm for finding an optimum sequential cover. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. We employ our random graph model to analyze our algorithm, and show that it can provably tolerate a relatively large ambiguity while still giving expected optimal performance in polynomial time. To study the algorithm's performance in practice, we tested it on experimental data sets from a variety of proteins and experimental set-ups. The algorithm was able to overcome significant noise and local ambiguity and consistently identify significant sequential fragments.