Coloring random and semi-random k-colorable graphs
Journal of Algorithms
Algorithmic theory of random graphs
Random Structures & Algorithms - Special issue: average-case analysis of algorithms
Communication complexity
The String-to-String Correction Problem
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Heuristics for semirandom graph problems
Journal of Computer and System Sciences
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
SIAM Journal on Computing
A sublinear algorithm for weakly approximating edit distance
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Journal of the ACM (JACM)
Approximating Edit Distance Efficiently
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Oblivious string embeddings and edit distance approximations
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A dictionary for approximate string search and longest prefix search
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The Smoothed Complexity of Edit Distance
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Fast and compact regular expression matching
Theoretical Computer Science
Overcoming the l1 non-embeddability barrier: algorithms for product metrics
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximating edit distance in near-linear time
Proceedings of the forty-first annual ACM symposium on Theory of computing
Lower bounds for edit distance and product metrics via Poincaré-type inequalities
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
The Computational Hardness of Estimating Edit Distance
SIAM Journal on Computing
Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Hi-index | 0.00 |
We initiate the study of the smoothed complexity of sequence alignment, by proposing a semi-random model of edit distance between two input strings, generated as follows: First, an adversary chooses two binary strings of length d and a longest common subsequence A of them. Then, every character is perturbed independently with probability p, except that A is perturbed in exactly the same way inside the two strings. We design two efficient algorithms that compute the edit distance on smoothed instances up to a constant factor approximation. The first algorithm runs in near-linear time, namely d{1+ε} for any fixed ε 0. The second one runs in time sublinear in d, assuming the edit distance is not too small. These approximation and runtime guarantees are significantly better than the bounds that were known for worst-case inputs. Our technical contribution is twofold. First, we rely on finding matches between substrings in the two strings, where two substrings are considered a match if their edit distance is relatively small, a prevailing technique in commonly used heuristics, such as PatternHunter of Ma et al. [2002]. Second, we effectively reduce the smoothed edit distance to a simpler variant of (worst-case) edit distance, namely, edit distance on permutations (a.k.a. Ulam's metric). We are thus able to build on algorithms developed for the Ulam metric, whose much better algorithmic guarantees usually do not carry over to general edit distance.