Approximation algorithms for NMR spectral peak assignment

  • Authors:
  • Zhi-Zhong Chen;Tao Jiang;Guohui Lin;Jianjun Wen;Dong Xu;Jinbo Xu;Ying Xu

  • Affiliations:
  • Department of Mathematical Sciences, Tokyo Denki University, Hatoyama, Saitama 350-0394, Japan;Department of Computer Science, University of California, Riverside, CA;Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada;Department of Computer Science, University of California, Riverside, CA;Protein Informatics Group, Oak Ridge National Laboratory, Oak Ridge, TN;Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada;Protein Informatics Group, Oak Ridge National Laboratory, Oak Ridge, TN

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2003

Quantified Score

Hi-index 5.23

Visualization

Abstract

We study a constrained bipartite matching problem where the input is a weighted bipartite graph G = (U, V, E), U is a set of vertices following a sequential order, V is another set of vertices partitioned into a collection of disjoint subsets, each following a sequential order, and E is a set of edges between U and V with non-negative weights. The objective is to find a matching in G with the maximum weight that satisfies the given sequential orders on both U and V, i.e. if ui+1 follows ui in U and if Vj+1 follows vj in V, then ui is matched with vj if and only if ui+1 is matched with vj+1. The problem has recently been formulated as a crucial step in an algorithmic approach for interpreting NMR spectral data (IEEE Comput Sci. Eng. 4 (2002) 50-62). The interpretation of NMR spectral data is known as a key problem in protein structure determination via NMR spectroscopy. Unfortunately, the constrained bipartite matching problem is NP-hard (IEEE Comput. Sci. Eng. 4 (2002) 50-62). We first propose a 2-approximation algorithm for the problem, which follows directly from the recent result of Bar-Noy et al. (Proc. 32nd ACM Symp. on Theory of Computing (STOC'00), 2000, pp. 735 -744) on interval scheduling. However, our extensive experimental results on real NMR spectral data illustrate that the algorithm perform poorly in terms of recovering target-matching edges. We then propose another approximation algorithm that tries to take advantage of the "density" of the sequential order information in V. Although we are only able to prove an approximation ratio of 3 log2D for this algorithm, where D is the length of a longest string in V, the experimental results demonstrate that this new algorithm performs much better on real data, i.e. it is able to recover a large fraction of target-matching edges and the weight of its output matching is often in fact close to the maximum. We also prove that the problem is MAX SNP-hard, even if the input bipartite graph is unweighted. We further present an approximation algorithm for a nontrivial special case that breaks the ratio 2 barrier.