Approximating Edit Distance Efficiently

Authors:
Ziv Bar-Yossef;T. S. Jayram;Robert Krauthgamer;Ravi Kumar
Affiliations:
Technion;IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center
Venue:
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Year:
2004

Citing 0
Cited 29

Low distortion embeddings for edit distance

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
The intractability of computing the Hamming distance

Theoretical Computer Science
Nonembeddability theorems via Fourier analysis

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Oblivious string embeddings and edit distance approximations

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved lower bounds for embeddings into L1

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A dictionary for approximate string search and longest prefix search

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A relation between edit distance for ordered trees and edit distance for Euler strings

Information Processing Letters
An Efficient Web Page Change Detection System Based on an Optimized Hungarian Algorithm

IEEE Transactions on Knowledge and Data Engineering
Estimating the sortedness of a data stream

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Low distortion embeddings for edit distance

Journal of the ACM (JACM)
Edit distance for a run-length-encoded string and an uncompressed string

Information Processing Letters
Vector representations for efficient comparison and search for similar strings

Cybernetics and Systems Analysis
An approach for continuous inspection of source code

Proceedings of the 6th international workshop on Software quality
Sketching in adversarial environments

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Self-tuning query mesh for adaptive multi-route query processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Approximating edit distance in near-linear time

Proceedings of the forty-first annual ACM symposium on Theory of computing
LCS Approximation via Embedding into Local Non-repetitive Strings

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
LCS approximation via embedding into locally non-repetitive strings

Information and Computation
The Computational Hardness of Estimating Edit Distance

SIAM Journal on Computing
Approximate String Processing

Foundations and Trends in Databases
Polylogarithmic approximation for edit distance and the asymmetric query complexity

Property testing
Polylogarithmic approximation for edit distance and the asymmetric query complexity

Property testing
Space lower bounds for online pattern matching

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Locally consistent parsing and applications to approximate string comparisons

DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
The smoothed complexity of edit distance

ACM Transactions on Algorithms (TALG)
Improved sketching of hamming distance with error correcting

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Efficient communication protocols for deciding edit distance

ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Space lower bounds for online pattern matching

Theoretical Computer Science
Homomorphic fingerprints under misalignments: sketching edit and shift distances

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than \ell, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs.(kn)^{{2 \mathord{\left/ {\vphantom {2 3}} \right. \kern-\nulldelimiterspace} 3}} gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. \ell gap problem, where \ell can be as small as O(k虏) 驴 still with a constant sketch 驴 but works only for strings that are mildly "non-repetitive". Finally, we develop an n^{{3 \mathord{\left/ {\vphantom {3 7}} \right. \kern-\nulldelimiterspace} 7}}-approximation quasi-linear time algorithm for edit distance, improving the previous best factor of n^{{3 \mathord{\left/ {\vphantom {3 4}} \right. \kern-\nulldelimiterspace} 4}} [5]; if the input strings are assumed to be non-repetitive, then the approximation factor can be strengthened to n^{{1 \mathord{\left/ {\vphantom {1 3}} \right. \kern-\nulldelimiterspace} 3}}.