Trilogy: discovery of sequence-structure patterns across diverse proteins

  • Authors:
  • Phil Bradley;Peter S. Kim;Bonnie Berger

  • Affiliations:
  • MIT, Cambridge, MA;MIT, Cambridge, MA;MIT, Cambridge, MA

  • Venue:
  • Proceedings of the sixth annual international conference on Computational biology
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a new computer program, Trilogy, for the automated discovery of sequence-structure patterns in proteins. Trilogy implements a pattern discovery algorithm that begins with an exhaustive analysis of flexible three-residue patterns; a subset of these patterns are selected as seeds for an extension process in which longer patterns are identified. A key feature of the method is explicit treatment of both the sequence and structure components of these motifs: each Trilogy pattern is a pair consisting of a sequence pattern and a structure pattern. Matches to both these component patterns are identified independently, allowing the program to assign a significance score to each sequence-structure pattern that assesses the degree of correlation between the corresponding sequence and structure motifs. Trilogy identifies several thousand high-scoring patterns that occur across protein families. These include both previously identified and novel motifs. We expect that these sequence-structure patterns will be useful in predicting protein structure from sequence, annotating newly determined protein structures, and identifying novel motifs of potential functional or structural significance.