Distinguishing string selection problems

  • Authors:
  • J. Kevin Lanctot;Ming Li;Bin Ma;Shaojiu Wang;Louxin Zhang

  • Affiliations:
  • Department of Computer Science, University of Waterloo, Waterloo, Ont., Canada N2L 3G1;Department of Computer Science, University of Waterloo, Waterloo, Ont., Canada N2L 3G1;Department of Computer Science, University of Western Ontario, London, Ont., Canada N6A 5B7;Pasteur Merieux Connaught Canada, 1755 Steeles Avenue West, Toronto, Ont., Canada M2R 3T4;Department of Mathematics, National University of Singapore, Singapore 117543, Singapore

  • Venue:
  • Information and Computation
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences. All these problems reduce to the task of finding a pattern that, with some error, occurs in one set of strings (Closest Substring Problem) and does not occur in another set (Farthest String Problem). In this paper, we break down the problem into several subproblems and prove the following results. 1. The following are all NP-Hard: the Farthest String Problem, the Closest Substring Problem, and the Closest String Problem of finding a string that is close to each string in a set. 2. There is a PTAS for the Farthest String Problem based on a linear programming relaxation technique. 3. There is a polynomial-time (4/3 + ε)-approximation algorithm for the Closest String Problem for any small constant ε 0. Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4. The problem of finding a string that is at least Hamming distance d from as many strings in a set as possible, cannot be approximated within nε in polynomial time for some fixed constant ε unless NP = P, where n is the number of strings in the set. 5. There is a polynomial-time 2-approximation for finding a string that is both the Closest Substring to one set, and the Farthest String from another set.