Linear-time protein 3-D structure searching with insertions and deletions

Authors:
Tetsuo Shibuya;Jesper Jansson;Kunihiko Sadakane
Affiliations:
Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan;Ochanomizu University, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Year:
2009

Citing 5
Cited 0

Least-Squares Fitting of Two 3-D Point Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identification of partially obscured objects in two and three dimensions by matching noisy characteristic

International Journal of Robotics Research
Estimating 3-D rigid body transformations: a comparison of four major algorithms

Machine Vision and Applications - Special issue on performance evaluation
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Algorithmic Aspects of Protein Structure Similarity

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

It becomes more and more important to search for similar structures from molecular 3-D structure databases in the structural biology of the post genomic era. Two molecules are said to be similar if the RMSD (root mean square deviation) of the two molecules is less than or equal to some given constant bound. In this paper, we consider an important, fundamental problem of finding all the similar substructures from 3-D structure databases of chain molecules (such as proteins), with consideration of indels (i.e., insertions and deletions). The problem has been believed to be very difficult, but its computational difficulty has not been well known. In this paper, we first show that the same problem in arbitrary dimension is NP-hard. Moreover, we also propose a new algorithm that dramatically improves the average-case time complexity for the problem, in case the number of indels k is bounded by some constant. Our algorithm solves the above problem in average O(N) time, while the time complexity of the best known algorithm was O(Nmk+1), for a query of size m and a database of size N.