Introduction to algorithms
Output-sensitive results on convex hulls, extreme points, and related problems
Proceedings of the eleventh annual symposium on Computational geometry
Matrix computations (3rd ed.)
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Finding the consensus shape for a protein family
Proceedings of the eighteenth annual symposium on Computational geometry
3-D Substructure Matching in Protein Molecules
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Hi-index | 0.00 |
A protein can be considered as a string (on the alphabet of 20 amino acids) or as a structure (each protein folds into a particular 3D configuration). Consider the following string-based problem: Given two protein strings that are not necessarily similar in their entirety, determine the most similar contiguous substrings, one from each protein. The exact meaning of most similar here is determined by the user; it is based on user-specified scores for character vs. character similarity and for character vs. space similarity. It is important to allow for spaces or gaps because evolutionary changes to proteins often involve insertion or deletion of one or more individual amino acids. For this kind of string-based similarity, the most-similar substrings can be determined in time O(mn) using Dynamic Programming (DP).The goal here is to design an algorithm for similarity of protein structures as opposed to protein strings. The inspiration for our algorithm is drawn from the DP-based similarity algorithm for strings. Instead of comparing sequences of characters, we compare sequences of vectors. One complication for working with structures instead of strings is the problem of orientation: basically, two structures that have similar shape can "look different" if they are at different orientations. Algorithmically, this means that we must establish the optimal orientations for our two proteins as well as finding the similar subsequences. In other words, an algorithm for similarity of structures involves both discrete optimization (to find the corresponding subsequences) and continuous optimization (to find the optimal orientation). Interestingly, if the correspondence is given then the optimal orientation (for that correspondence) is easy to find, and if the the orientation is given then the optimal correspondence (for that orientation) is easy to find. The challenge is to accomplish both optimizations at once. Note that the technique presented here produces a globally optimal solution; there are no approximations or assumptions of randomness.