Closest Substring Problems with Small Distances

Authors:
Dániel Marx
Affiliations:
dmarx@cs.bme.hu
Venue:
SIAM Journal on Computing
Year:
2008

Citing 0
Cited 8

Efficient Algorithms for the Closest String and Distinguishing String Selection Problems

FAW '09 Proceedings of the 3d International Workshop on Frontiers in Algorithmics
Average parameterization and partial kernelization for computing medians

Journal of Computer and System Sciences
A three-string approach to the closest string problem

Journal of Computer and System Sciences
Average parameterization and partial kernelization for computing medians

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Slightly superexponential parameterized problems

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
What's next? future directions in parameterized complexity

The Multivariate Algorithmic Revolution and Beyond
The parameterized complexity of the shared center problem

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
On approximating string selection problems with outliers

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study two pattern matching problems that are motivated by applications in computational biology. In the Closest Substring problem $k$ strings $s_1,\dots, s_k$ are given, and the task is to find a string $s$ of length $L$ such that each string $s_i$ has a consecutive substring of length $L$ whose distance is at most $d$ from $s$. We present two algorithms that aim to be efficient for small fixed values of $d$ and $k$: for some functions $f$ and $g$, the algorithms have running time $f(d)\cdot n^{O(\log d)}$ and $g(d,k)\cdot n^{O(\log\log k)}$, respectively. The second algorithm is based on connections with the extremal combinatorics of hypergraphs. The Closest Substring problem is also investigated from the parameterized complexity point of view. Answering an open question from [P. A. Evans, A. D. Smith, and H. T. Wareham, Theoret. Comput. Sci., 306 (2003), pp. 407-430, M. R. Fellows, J. Gramm, and R. Niedermeier, Combinatorica, 26 (2006), pp. 141-167, J. Gramm, J. Guo, and R. Niedermeier, Lecture Notes in Comput. Sci. 2751, Springer, Berlin, 2003, pp. 195-209, J. Gramm, R. Niedermeier, and P. Rossmanith, Algorithmica, 37 (2003), pp. 25-42], we show that the problem is W[1]-hard even if both $d$ and $k$ are parameters. It follows as a consequence of this hardness result that our algorithms are optimal in the sense that the exponent of $n$ in the running time cannot be improved to $o(\log d)$ or to $o(\log \log k)$ (modulo some complexity-theoretic assumptions). Consensus Patterns is the variant of the problem where, instead of the requirement that each $s_i$ has a substring that is of distance at most $d$ from $s$, we have to select the substrings in such a way that the average of these $k$ distances is at most $\delta$. By giving an $f(\delta)\cdot n^9$ time algorithm, we show that the problem is fixed-parameter tractable. This answers an open question from [M. R. Fellows, J. Gramm, and R. Niedermeier, Combinatorica, 26 (2006), pp. 141-167].