A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

Authors:
Maxime Crochemore;Gad M. Landau;Michal Ziv-Ukelson
Affiliations:
-;-;-
Venue:
SIAM Journal on Computing
Year:
2003

Citing 0
Cited 37

Bioinformatics on a Heterogeneous Java Distributed System

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 5 - Volume 06
Speeding up transposition-invariant string matching

Information Processing Letters
Longest common subsequence problem for unoriented and cyclic strings

Theoretical Computer Science
An efficient alignment algorithm for masked sequences

Theoretical Computer Science
Edit distance for a run-length-encoded string and an uncompressed string

Information Processing Letters
Computing similarity of run-length encoded strings with affine gap penalty

Theoretical Computer Science
Fast Algorithms for Computing Tree LCS

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Sequence Alignment Algorithms for Run-Length-Encoded Strings

COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
Semi-local longest common subsequences in subquadratic time

Journal of Discrete Algorithms
Brief Communication: Whole genome assembly from 454 sequencing output via modified DNA graph concept

Computational Biology and Chemistry
LCS Approximation via Embedding into Local Non-repetitive Strings

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Approximate Matching for Run-Length Encoded Strings Is 3sum-Hard

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Fast algorithms for computing tree LCS

Theoretical Computer Science
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

ACM Transactions on Algorithms (TALG)
A faster algorithm for the computation of string convolutions using LZ78 parsing

Information Processing Letters
Bit-Parallel Algorithm for the Constrained Longest Common Subsequence Problem

Fundamenta Informaticae
Hardness of comparing two run-length encoded strings

Journal of Complexity
Fast computation of a longest increasing subsequence and application

Information and Computation
Cache-Oblivious Dynamic Programming for Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A fully compressed algorithm for computing the edit distance of run-length encoded strings

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Multiplication algorithms for Monge matrices

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
LCS approximation via embedding into locally non-repetitive strings

Information and Computation
Have your spaghetti and eat it too: evolutionary algorithmics and post-evolutionary analysis

Genetic Programming and Evolvable Machines
Towards approximate matching in compressed strings: local subsequence recognition

CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
All semi-local longest common subsequences in subquadratic time

CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Monge properties of sequence alignment

Theoretical Computer Science
SA-REPC: sequence alignment with regular expression path constraint

LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Computing similarity of run-length encoded strings with affine gap penalty

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Bidirectional delta files

Information Processing and Management: an International Journal
Computing a longest increasing subsequence of length k in time O(n log log k)

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Fast and cache-oblivious dynamic programming with local dependencies

LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Speeding up q-gram mining on grammar-based compressed texts

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Pattern discovery in annotated dialogues using dynamic programming

International Journal of Intelligent Information and Database Systems
Efficient LZ78 factorization of grammar compressed text

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
A divide and conquer approach and a work-optimal parallel algorithm for the LIS problem

Information Processing Letters
Computing a longest common subsequence that is almost increasing on sequences having no repeated elements

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.01

Visualization

Abstract

Given two strings of size $n$ over a constant alphabet, the classical algorithm for computing the similarity between two sequences [D. Sankoff and J. B. Kruskal, eds., {Time Warps, String Edits, and Macromolecules}; Addison--Wesley, Reading, MA, 1983; T. F. Smith and M. S. Waterman, { J.\ Molec.\ Biol., 147 (1981), pp. 195--197] uses a dynamic programming matrix and compares the two strings in O(n2) time. We address the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both {local} and {global} similarity computations. The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel--Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an $O(n^2 / \log n)$, algorithm for an input of constant alphabet size. For most texts, the time complexity is actually $O(h n^2 / \log n)$, where $h \le 1$ is the entropy of the text. We also present an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity. This result extends to all distance or similarity scoring schemes that use an additive gap penalty.