Finding Largest Well-Predicted Subset of Protein Structure Models

  • Authors:
  • Shuai Cheng Li;Dongbo Bu;Jinbo Xu;Ming Li

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo, Canada;David R. Cheriton School of Computer Science, University of Waterloo, Canada and Institute of Computing Technology, Chinese Academy of Sciences, China;Toyota Technological Institute at Chicago, USA;David R. Cheriton School of Computer Science, University of Waterloo, Canada

  • Venue:
  • CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset-- a maximum subset of the model which matches the native structure [12]. The problem is solvable in O(n7) time, albeit too slow for practical usage. We present a (1 + 茂戮驴)ddistance approximation algorithm that runs in time O(n3logn/茂戮驴5) for general protein structures. In the case of globular proteins, this result can be enhanced to a randomized O(nlog2n) time algorithm with probability at least 1 茂戮驴 O(1/n). In addition, we propose a (1 + 茂戮驴)-approximation algorithm to compute the minimum distance to fit all the points of a model to its native structure in time O(n(loglogn+ log1/茂戮驴)/茂戮驴5). We have implemented our algorithms and results indicate our program finds much more matched pairs with less running time than TMScore, which is one of the most popular tools to assess the quality of predicted models.