Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model

Authors:
Zhi-Zhong Chen;Lusheng Wang
Affiliations:
Tokyo Denki University, Hatomaya, Saitama;City University of Hong Kong, Hong Kong
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 0
Cited 5

A three-string approach to the closest string problem

Journal of Computer and System Sciences
The parameterized complexity of the shared center problem

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Enumerating neighbour and closest strings

IPEC'12 Proceedings of the 7th international conference on Parameterized and Exact Computation
An efficient two-phase ant colony optimization algorithm for the closest string problem

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
An improved voting algorithm for planted (l,d) motif search

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd\cdot 17.97^d) time for DNA strings and in O(nL + nd\cdot 61.86^d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + nd\cdot 13.92^d) time for DNA strings and in O(nL + nd\cdot 47.21^d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n-1)m^2(L + d\cdot 17.97^d\cdot m^{\lfloor \log_2(d+1)\rfloor })) time for DNA strings and in O((n-1)m^2(L + d\cdot 61.86^d\cdot m^{\lfloor \log_2(d+1)\rfloor })) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too.