The smoothed complexity of edit distance

Authors:
Alexandr Andoni;Robert Krauthgamer
Affiliations:
Microsoft Research SVC, CA;The Weizmann Institute of Science, Israel
Venue:
ACM Transactions on Algorithms (TALG)
Year:
2012

Citing 19
Cited 0

Coloring random and semi-random k-colorable graphs

Journal of Algorithms
Algorithmic theory of random graphs

Random Structures & Algorithms - Special issue: average-case analysis of algorithms
Communication complexity

Communication complexity
The String-to-String Correction Problem

Journal of the ACM (JACM)
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Heuristics for semirandom graph problems

Journal of Computer and System Sciences
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces

SIAM Journal on Computing
A sublinear algorithm for weakly approximating edit distance

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time

Journal of the ACM (JACM)
Approximating Edit Distance Efficiently

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Oblivious string embeddings and edit distance approximations

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A dictionary for approximate string search and longest prefix search

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The Smoothed Complexity of Edit Distance

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Fast and compact regular expression matching

Theoretical Computer Science
Overcoming the l1 non-embeddability barrier: algorithms for product metrics

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximating edit distance in near-linear time

Proceedings of the forty-first annual ACM symposium on Theory of computing
Lower bounds for edit distance and product metrics via Poincaré-type inequalities

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
The Computational Hardness of Estimating Edit Distance

SIAM Journal on Computing
Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We initiate the study of the smoothed complexity of sequence alignment, by proposing a semi-random model of edit distance between two input strings, generated as follows: First, an adversary chooses two binary strings of length d and a longest common subsequence A of them. Then, every character is perturbed independently with probability p, except that A is perturbed in exactly the same way inside the two strings. We design two efficient algorithms that compute the edit distance on smoothed instances up to a constant factor approximation. The first algorithm runs in near-linear time, namely d{1+ε} for any fixed ε 0. The second one runs in time sublinear in d, assuming the edit distance is not too small. These approximation and runtime guarantees are significantly better than the bounds that were known for worst-case inputs. Our technical contribution is twofold. First, we rely on finding matches between substrings in the two strings, where two substrings are considered a match if their edit distance is relatively small, a prevailing technique in commonly used heuristics, such as PatternHunter of Ma et al. [2002]. Second, we effectively reduce the smoothed edit distance to a simpler variant of (worst-case) edit distance, namely, edit distance on permutations (a.k.a. Ulam's metric). We are thus able to build on algorithms developed for the Ulam metric, whose much better algorithmic guarantees usually do not carry over to general edit distance.