Finding longest common segments in protein structures in nearly linear time

  • Authors:
  • Yen Kaow Ng;Hirotaka Ono;Ling Ge;Shuai Cheng Li

  • Affiliations:
  • Department of Computer Science, Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Malaysia;Department of Economic Engineering, Faculty of Economics, Kyushu University, Japan;College of Business, University of Massachusetts Dartmouth, North Dartmouth;Department of Computer Science, City University of Hong Kong, Hong Kong

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A=(a1, …, an) and B=(b1, …, bn) where ak, bk∈ℝ3, we are to find, among all the segments f=(ai,…,aj) and g=(bi,…,bj) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean square deviation (RMSD) between f and g is to be within a given t∈ℝ; (2) f and g can be superposed such that for each k, i≤k≤j, ||ak−bk||≤t for a given t∈ℝ. We give an algorithm of $O(n\log n+n\mbox{\it \textbf{l}})$ time complexity when the first requirement applies, where $\mbox{\it \textbf{l}}$ is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any ε∈ℝ, finds a segment of length at least l, but of RMSD up to (1+ε)t, in O(nlogn+n/ε) time. We propose an FPTAS which for any given ε∈ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k, i≤k≤j, ||ak−bk||≤(1+ε) t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(nlog2n/ε5) when consecutive points in A are separated by the same distance (which is the case with protein structures).