Searching Protein 3-D Structures in Linear Time

Authors:
Tetsuo Shibuya
Affiliations:
Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan 108-8639
Venue:
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Year:
2009

Citing 6
Cited 0

Least-Squares Fitting of Two 3-D Point Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identification of partially obscured objects in two and three dimensions by matching noisy characteristic

International Journal of Robotics Research
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Estimating 3-D rigid body transformations: a comparison of four major algorithms

Machine Vision and Applications - Special issue on performance evaluation
Prefix-shuffled geometric suffix tree

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Geometric suffix tree: a new index structure for protein 3-d structures

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding similar structures from 3-D structure databases of proteins is becoming more and more important issue in the post-genomic molecular biology. To compare 3-D structures of two molecules, biologists mostly use the RMSD (root mean square deviation) as the similarity measure. We propose new theoretically and practically fast algorithms for the fundamental problem of finding all the substructures of structures in a structure database of chain molecules (such as proteins), whose RMSDs to the query are within a given constant threshold. We first propose a breakthrough linear-expected-time algorithm for the problem, while the previous best-known time complexity was O (N logm ), where N is the database size and m is the query size. For the expected time analysis, we propose to use the random-walk model (or the ideal chain model) as the model of average protein structures. We furthermore propose a series of preprocessing algorithms that enable faster queries. We checked the performance of our linear-expected-time algorithm through computational experiments over the whole PDB database. According to the experiments, our algorithm is 3.6 to 28 times faster than previously known algorithms for ordinary queries. Moreover, the experimental results support the validity of our theoretical analyses.