Efficient Algorithms for the Closest String and Distinguishing String Selection Problems

  • Authors:
  • Lusheng Wang;Binhai Zhu

  • Affiliations:
  • Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, Montana State University, Bozeman, USA MT 59717

  • Venue:
  • FAW '09 Proceedings of the 3d International Workshop on Frontiers in Algorithmics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the paper, we study three related problems, the closest string problem, the farthest string problem and the distinguishing string selection problem. These problems have applications in motif detection, binding sites locating, genetic drug target identification, genetic probes design, universal PCR primer design, etc. They have been extensively studied in recent years. The problems are defined as follows: The closest string problem: given a group of strings ${\cal B}=\{s_1, s_2, \ldots,$ s n }, each of length L , and an integer d , the problem is to compute a center string s of length L such that the Hamming distance d (s , s i ) ≤ d for all $s_y\in {\cal B}$. The farthest string problem: given a group of strings ${\cal G}=\{g_1,g_2,...,$ $g_{n_2}\}$, with all strings of the same length L , and an integer d b , the farthest string problem is to compute a center string s of length L such that the Hamming distance d (s ,g j ) *** L *** d b for all $ g_j\in {\cal G}$. The distinguishing string selection problem: given two groups of strings ${\cal B}$ (bad genes) and ${\cal G}$ (good genes), ${\cal B}=\{s_1,s_2,...,s_{n_1}\}$ and ${\cal G}=\{g_{n_1+1},g_{n_1+2},...,g_{n_2}\}$, with all strings of the same length L , and two integers d b and d g with d g *** L *** d b , the Distinguishing String Selection problem is to compute a center string s of length L such that the Hamming distance $d(s,s_i)\leq d_b, \forall s_i\in{\cal B}$ and the Hamming distance d (s ,g j ) *** d g for all $g_j\in {\cal G}$. Our results: We design an O (Ln + nd (|Σ *** 1|) d 23.25d ) time fixed parameter algorithm for the closest string problem which improves upon the best known O (Ln + nd 24d ×(|Σ | *** 1) d ) algorithm in [14], where |Σ | is the size of the alphabet. We also design fixed parameter algorithms for both the farthest string problem and the distinguishing string selection problem. Both algorithms run in time $O(Ln+nd2^{3.25d_b})$ when the input strings are binary strings over Σ = {0, 1}.