A sublinear algorithm for weakly approximating edit distance

Authors:
Tugkan Batu;Funda Ergün;Joe Kilian;Avner Magen;Sofya Raskhodnikova;Ronitt Rubinfeld;Rahul Sami
Affiliations:
University of Pennsylvania, Philadelphia, PA;Case Western Reserve University;NEC Laboratories America;University of Toronto, Toronto, ON, CANADA;MIT, Cambridge, MA;NEC Laboratories America;Yale University, New Haven, CT
Venue:
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Year:
2003

Citing 4
Cited 30

Introducing efficient parallelism into approximate string matching and a new serial algorithm

STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Approximate string matching: a simpler faster algorithm

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Communication complexity of document exchange

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Efficient approximate and dynamic matching of patterns using a labeling paradigm

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science

Estimating the weight of metric minimum spanning trees in sublinear-time

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Image similarity search with compact data structures

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Low distortion embeddings for edit distance

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
The intractability of computing the Hamming distance

Theoretical Computer Science
Nonembeddability theorems via Fourier analysis

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Oblivious string embeddings and edit distance approximations

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved lower bounds for embeddings into L1

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Tolerant property testing and distance approximation

Journal of Computer and System Sciences
Estimating the sortedness of a data stream

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Low distortion embeddings for edit distance

Journal of the ACM (JACM)
Overcoming the l1 non-embeddability barrier: algorithms for product metrics

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Property Testing: A Learning Theory Perspective

Foundations and Trends® in Machine Learning
Approximating edit distance in near-linear time

Proceedings of the forty-first annual ACM symposium on Theory of computing
Periodicity testing with sublinear samples and space

ACM Transactions on Algorithms (TALG)
Algorithmic and Analysis Techniques in Property Testing

Foundations and Trends® in Theoretical Computer Science
Property testing and parameter testing for permutations

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Near-optimal sublinear time algorithms for Ulam distance

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Approximate Satisfiability and Equivalence

SIAM Journal on Computing
The Computational Hardness of Estimating Edit Distance

SIAM Journal on Computing
Polylogarithmic approximation for edit distance and the asymmetric query complexity

Property testing
Polylogarithmic approximation for edit distance and the asymmetric query complexity

Property testing
Testing permutation properties through subpermutations

Theoretical Computer Science
Finding frequent patterns in a string in sublinear time

ESA'05 Proceedings of the 13th annual European conference on Algorithms
A multi-level framework for the analysis of sequential data

Data Mining
Sublinear Time Algorithms

SIAM Journal on Discrete Mathematics
The smoothed complexity of edit distance

ACM Transactions on Algorithms (TALG)
Improved sketching of hamming distance with error correcting

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Efficient communication protocols for deciding edit distance

ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)
Homomorphic fingerprints under misalignments: sketching edit and shift distances

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(nmax(Α/2, 2Α - 1\)) for any fixed Α Α/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit distance at least n/6.