Why large CLOSEST STRING instances are easy to solve in practice

  • Authors:
  • Christina Boucher;Kathleen Wilkie

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo;Department of Applied Mathematics, University of Waterloo

  • Venue:
  • SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We initiate the study of the smoothed complexity of the CLOSEST STRING problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the CLOSEST STRING problem and give a randomized algorithm, we refer to as CSP-Greedy, that computes the closest string on smoothed instances up to a constant factor approximation in time O(l3), where l is the string length. Using smoothed analysis, We prove CSP-Greedy achieves a ((1 + εe/2n))l -approximation guarantee, where ε 0 is any small value and n is the number of strings. These approximation and runtime guarantees demonstrate that CLOSEST STRING instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-greedy runs extremely efficiently on instances with a large number of strings. This counterintuitive fact that "large" CLOSEST STRING instances are easier and more efficient to solve gives new insight into this well-investigated problem.