The Smoothed Complexity of Edit Distance

Authors:
Alexandr Andoni;Robert Krauthgamer
Affiliations:
MIT, ;Weizmann Institute and IBM Almaden,
Venue:
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Year:
2008

Citing 0
Cited 7

Overcoming the l1 non-embeddability barrier: algorithms for product metrics

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Smoothed analysis: an attempt to explain the behavior of algorithms in practice

Communications of the ACM - A View of Parallel Computing
Why greed works for shortest common superstring problem

Theoretical Computer Science
Near-optimal sublinear time algorithms for Ulam distance

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Why large CLOSEST STRING instances are easy to solve in practice

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
The bounded search tree algorithm for the closest string problem has quadratic smoothed complexity

MFCS'11 Proceedings of the 36th international conference on Mathematical foundations of computer science
The smoothed complexity of edit distance

ACM Transactions on Algorithms (TALG)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We initiate the study of the smoothed complexity of sequencealignment, by proposing a semi-random model of edit distancebetween two input strings, generated as follows. First, anadversary chooses two binary strings of length d and alongest common subsequence A of them. Then, everycharacter is perturbed independently with probability p,except that A is perturbed in exactly the same way insidethe two strings.We design two efficient algorithms that compute the editdistance on smoothed instances up to a constant factorapproximation. The first algorithm runs in near-linear time, namelyd1 + ε for any fixedε 0. The second one runs in time sublinear ind, assuming the edit distance is not too small. Theseapproximation and runtime guarantees are significantly better thenthe bounds known for worst-case inputs, e.g. near-linear timealgorithm achieving approximation roughly d1/3,due to Batu, Ergün, and Sahinalp [SODA 2006].Our technical contribution is twofold. First, we rely on findingmatches between substrings in the two strings, where two substringsare considered a match if their edit distance is relatively small,a prevailing technique in commonly used heuristics, such asPatternHunter of Ma, Tromp and Li [Bioinformatics, 2002]. Second,we effectively reduce the smoothed edit distance to a simplervariant of (worst-case) edit distance, namely, edit distance onpermutations (a.k.a. Ulam's metric). We are thus able to build onalgorithms developed for the Ulam metric, whose much betteralgorithmic guarantees usually do not carry over to general editdistance.