On the closest string and substring problems

  • Authors:
  • Ming Li;Bin Ma;Lusheng Wang

  • Affiliations:
  • University of Waterloo, Waterloo, Ont., Canada;University of Western Ontario, London, Ont., Canada;City University of Hong Kong, Kowloon, Hong Kong, China

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

The problem of finding a center string that is "close" to everygiven string arises in computational molecular biology and codingtheory. This problem has two versions: the Closest String problemand the Closest Substring problem. Given a set of strings S= {s1, s2, ...,sn}, each of length m, the Closest Stringproblem is to find the smallest d and a string s of lengthm which is within Hamming distance d to eachsi ε S. This problem comes fromcoding theory when we are looking for a code not too far away froma given set of codes. Closest Substring problem, with an additionalinput integer L, asks for the smallest d and a strings, of length L, which is within Hamming distance daway from a substring, of length L, of each si. This problemis much more elusive than the Closest String problem. The ClosestSubstring problem is formulated from applications in findingconserved regions, identifying genetic drug targets and generatinggenetic probes in molecular biology. Whether there are efficientapproximation algorithms for both problems are major open questionsin this area. We present two polynomial-time approximationalgorithms with approximation ratio 1 + ε for any smallε to settle both questions.