RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Conditional graphical models for protein structure prediction
Conditional graphical models for protein structure prediction
Hi-index | 0.00 |
We describe a new computer program, Trilogy, for the automated discovery of sequence-structure patterns in proteins. Trilogy implements a pattern discovery algorithm that begins with an exhaustive analysis of flexible three-residue patterns; a subset of these patterns are selected as seeds for an extension process in which longer patterns are identified. A key feature of the method is explicit treatment of both the sequence and structure components of these motifs: each Trilogy pattern is a pair consisting of a sequence pattern and a structure pattern. Matches to both these component patterns are identified independently, allowing the program to assign a significance score to each sequence-structure pattern that assesses the degree of correlation between the corresponding sequence and structure motifs. Trilogy identifies several thousand high-scoring patterns that occur across protein families. These include both previously identified and novel motifs. We expect that these sequence-structure patterns will be useful in predicting protein structure from sequence, annotating newly determined protein structures, and identifying novel motifs of potential functional or structural significance.