Why large CLOSEST STRING instances are easy to solve in practice

Authors:
Christina Boucher;Kathleen Wilkie
Affiliations:
David R. Cheriton School of Computer Science, University of Waterloo;Department of Applied Mathematics, University of Waterloo
Venue:
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Year:
2010

Citing 17
Cited 1

On selecting a satisfying truth assignment (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Smoothed analysis of the perceptron algorithm for linear programming

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Banishing Bias from Consensus Sequences

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
A Probabilistic Algorithm for k-SAT and Constraint Satisfaction Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Distinguishing string selection problems

Information and Computation
On the Optimality of the Dimensionality Reduction Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Smoothed analysis of binary search trees

Theoretical Computer Science
Why Greed Works for Shortest Common Superstring Problem

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
The Smoothed Complexity of Edit Distance

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
More efficient algorithms for closest string and substring problems

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
New bounds for motif finding in strong instances

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Sharper upper and lower bounds for an approximation scheme for consensus-pattern

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Parameterized Complexity

Parameterized Complexity

Configurations and minority in the string consensus problem

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We initiate the study of the smoothed complexity of the CLOSEST STRING problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the CLOSEST STRING problem and give a randomized algorithm, we refer to as CSP-Greedy, that computes the closest string on smoothed instances up to a constant factor approximation in time O(l3), where l is the string length. Using smoothed analysis, We prove CSP-Greedy achieves a ((1 + εe/2n))l -approximation guarantee, where ε 0 is any small value and n is the number of strings. These approximation and runtime guarantees demonstrate that CLOSEST STRING instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-greedy runs extremely efficiently on instances with a large number of strings. This counterintuitive fact that "large" CLOSEST STRING instances are easier and more efficient to solve gives new insight into this well-investigated problem.