The Smoothed Complexity of Edit Distance

  • Authors:
  • Alexandr Andoni;Robert Krauthgamer

  • Affiliations:
  • MIT, ;Weizmann Institute and IBM Almaden,

  • Venue:
  • ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We initiate the study of the smoothed complexity of sequencealignment, by proposing a semi-random model of edit distancebetween two input strings, generated as follows. First, anadversary chooses two binary strings of length d and alongest common subsequence A of them. Then, everycharacter is perturbed independently with probability p,except that A is perturbed in exactly the same way insidethe two strings.We design two efficient algorithms that compute the editdistance on smoothed instances up to a constant factorapproximation. The first algorithm runs in near-linear time, namelyd1 + ε for any fixedε 0. The second one runs in time sublinear ind, assuming the edit distance is not too small. Theseapproximation and runtime guarantees are significantly better thenthe bounds known for worst-case inputs, e.g. near-linear timealgorithm achieving approximation roughly d1/3,due to Batu, Ergün, and Sahinalp [SODA 2006].Our technical contribution is twofold. First, we rely on findingmatches between substrings in the two strings, where two substringsare considered a match if their edit distance is relatively small,a prevailing technique in commonly used heuristics, such asPatternHunter of Ma, Tromp and Li [Bioinformatics, 2002]. Second,we effectively reduce the smoothed edit distance to a simplervariant of (worst-case) edit distance, namely, edit distance onpermutations (a.k.a. Ulam's metric). We are thus able to build onalgorithms developed for the Ulam metric, whose much betteralgorithmic guarantees usually do not carry over to general editdistance.