Improved Approximation Algorithms for NMR Spectral Peak Assignment

  • Authors:
  • Zhi-Zhong Chen;Tao Jiang;Guohui Lin;Jianjun Wen;Dong Xu;Ying Xu

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study a constrained bipartite matching problem where the input is a weighted bipartite graph G = (U, V,E), U is a set of vertices following a sequential order, V is another set of vertices partitioned into a collection of disjoint subsets, each following a sequential order, and E is a set of edges between U and V with non-negative weights. The objective is to find a matching in G with the maximum weight that satisfies the given sequential orders on both U and V , i.e., if ui+1 follows ui in U and if vj+1 follows vj in V, then ui is matched with vj if and only if ui+1 is matched with vj+1. The problem has recently been formulated as a crucial step in an algorithmic approach for interpreting NMR spectral data [15]. The interpretation of NMR spectral data is known as a key problem in protein structure determination via NMR spectroscopy. Unfortunately, the constrained bipartite matching problem is NP-hard [15]. We first propose a 2-approximation algorithm for the problem, which follows directly from the recent result of Bar-Noy et al. [2] on interval scheduling. However, our extensive experimental results on real NMR spectral data illustrate that the algorithm performs poorly in terms of recovering the target-matching (i.e. correct) edges. We then propose another approximation algorithm that tries to take advantage of the "density" of the sequential order information in V. Although we are only able to prove an approximation ratio of 3 log2 D for this algorithm, where D is the length of a longest string in V, the experimental results demonstrate that this new algorithm performs much better on real data, i.e. it is able to recover a large fraction of the target-matching edges and the weight of its output matching is often in fact close to the maximum. We also prove that the problem is MAX SNP-hard, even if the input bipartite graph is unweighted. We further present an approximation algorithm for a nontrivial special case that breaks the ratio 2 barrier.