The evolutionary capacity of protein structures

  • Authors:
  • Leonid Meyerguz;David Kempe;Jon Kleinberg;Ron Elber

  • Affiliations:
  • Cornell University, Ithaca, NY;University of Washington, Seattle, WA;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY

  • Venue:
  • RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In nature, one finds large collections of different protein sequences exhibiting roughly the same three-dimensional structure, and this observation underpins the study of structural protein families. In studying such families at a global level, a natural question to ask is how close to "optimal" the native sequences are in terms of their energy. We therefore define and compute the evolutionary capacity of a protein structure as the total number of sequences whose energy in the structure is below that of the native sequence. An important aspect of our definition is that we consider the space of all possible protein sequences, i.e. the exponentially large set of all strings over the 20-letter amino acid alphabet, rather than just the set of sequences found in nature.In order to make our approach computationally feasible, we develop randomized algorithms that perform approximate enumeration in sequence space with provable performance guarantees. We draw on the area of rapidly mixing Markov chains, by exhibiting a connection between the evolutionary capacity of proteins and the number of feasible solutions to the Knapsack problem. This connection allows us to design an algorithm for approximating the evolutionary capacity, extending a recent result of Morris and Sinclair on the Knapsack problem. We present computational experiments that show the method to be effective in practice on large collections of protein structures. In addition, we show how to use approximations to the evolutionary capacity to compute a statistical mechanics notion of "evolutionary temperature" on sequence space.