LCS approximation via embedding into locally non-repetitive strings

Authors:
G. M. Landau;A. Levy;I. Newman
Affiliations:
Department of Computer Science, University of Haifa, Haifa 31905, Israel and Department of Computer Science and Engineering, NYU-Poly, Six MetroTech Center, Brooklyn, NY 11201-3840, USA;Department of Software Engineering, Shenkar College, 12 Anna Frank, Ramat-Gan, Israel and CRI, University of Haifa, Mount Carmel, Haifa 31905, Israel;Department of Computer Science, University of Haifa, Haifa 31905, Israel
Venue:
Information and Computation
Year:
2011

Citing 19
Cited 0

Fast string matching with k-differences

Journal of Computer and System Sciences - 26th IEEE Conference on Foundations of Computer Science, October 21-23, 1985
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The String-to-String Correction Problem

Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem

Journal of the ACM (JACM)
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
A fast algorithm for computing longest common subsequences

Communications of the ACM
On the common substring alignment problem

Journal of Algorithms
Longest Common Subsequence from Fragments via Sparse Dynamic Programming

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

SIAM Journal on Computing
Sparse LCS common substring alignment

Information Processing Letters
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
Approximating Edit Distance Efficiently

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Low distortion embeddings for edit distance

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Oblivious string embeddings and edit distance approximations

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The Computational Hardness of Estimating Edit Distance [Extended Abstract]

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Approximate String Matching with Address Bit Errors

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Approximating edit distance in near-linear time

Proceedings of the forty-first annual ACM symposium on Theory of computing
Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least n^@e for some constant @e0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity O(n^2^-^@e) for some constant @e) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools.