More Efficient Algorithms for Closest String and Substring Problems

Authors:
Bin Ma;Xiaoming Sun
Affiliations:
binma@uwaterloo.ca;xiaomings@tsinghua.edu.cn
Venue:
SIAM Journal on Computing
Year:
2009

Citing 17
Cited 0

Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
Finding similar regions in many strings

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Distinguishing string selection problems

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the closest string and substring problems

Journal of the ACM (JACM)
A Linear-Time Algorithm for the 1-Mismatch Problem

WADS '97 Proceedings of the 5th International Workshop on Algorithms and Data Structures
Banishing Bias from Consensus Sequences

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
A Polynominal Time Approximation Scheme for the Closest Substring Problem

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Genetic Algorithm Approach for the Closest String Problem

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
On the complexity of finding common approximate substrings

Theoretical Computer Science
Hard problems in similarity searching

Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
The Closest Substring problem with small distances

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
On The Parameterized Intractability Of Motif Search Problems*

Combinatorica
On the Optimality of the Dimensionality Reduction Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Optimal Solutions for the Closest-String Problem via Integer Programming

INFORMS Journal on Computing
Complexities of the centre and median string problems

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Space and time efficient algorithms for planted motif search

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The closest string problem and the closest substring problem are all natural theoretical computer science problems and find important applications in computational biology. Given $n$ input strings, the closest string (substring) problem finds a new string within distance $d$ to (a substring of) each input string and such that $d$ is minimized. Both problems are NP-complete. In this paper we propose new algorithms for these two problems. For the closest string problem, we developed an exact algorithm with time complexity $O(n|\Sigma|^{O(d)})$, where $\Sigma$ is the alphabet. This improves the previously best known result $O(nd^{O(d)})$ and results into a polynomial time algorithm when $d=O(\log n)$. By using this algorithm, a polynomial time approximation scheme (PTAS) for the closest string problem is also given with time complexity $O(n^{O(\epsilon^{-2})})$, improving the previously best known $O(n^{O(\epsilon^{-2}\log\frac{1}{\epsilon})})$ PTAS. A new algorithm for the closest substring problem is also proposed. Finally, we prove that a restricted version of the closest substring problem has the same parameterized complexity as the closest substring, answering an open question in the literature.