Bioinformatics on a Heterogeneous Java Distributed System
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 5 - Volume 06
Speeding up transposition-invariant string matching
Information Processing Letters
Longest common subsequence problem for unoriented and cyclic strings
Theoretical Computer Science
An efficient alignment algorithm for masked sequences
Theoretical Computer Science
Edit distance for a run-length-encoded string and an uncompressed string
Information Processing Letters
Computing similarity of run-length encoded strings with affine gap penalty
Theoretical Computer Science
Fast Algorithms for Computing Tree LCS
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Sequence Alignment Algorithms for Run-Length-Encoded Strings
COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
Semi-local longest common subsequences in subquadratic time
Journal of Discrete Algorithms
Brief Communication: Whole genome assembly from 454 sequencing output via modified DNA graph concept
Computational Biology and Chemistry
LCS Approximation via Embedding into Local Non-repetitive Strings
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Approximate Matching for Run-Length Encoded Strings Is 3sum-Hard
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Fast algorithms for computing tree LCS
Theoretical Computer Science
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
ACM Transactions on Algorithms (TALG)
A faster algorithm for the computation of string convolutions using LZ78 parsing
Information Processing Letters
Bit-Parallel Algorithm for the Constrained Longest Common Subsequence Problem
Fundamenta Informaticae
Hardness of comparing two run-length encoded strings
Journal of Complexity
Fast computation of a longest increasing subsequence and application
Information and Computation
Cache-Oblivious Dynamic Programming for Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A fully compressed algorithm for computing the edit distance of run-length encoded strings
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Multiplication algorithms for Monge matrices
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
LCS approximation via embedding into locally non-repetitive strings
Information and Computation
Have your spaghetti and eat it too: evolutionary algorithmics and post-evolutionary analysis
Genetic Programming and Evolvable Machines
Towards approximate matching in compressed strings: local subsequence recognition
CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
All semi-local longest common subsequences in subquadratic time
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Monge properties of sequence alignment
Theoretical Computer Science
SA-REPC: sequence alignment with regular expression path constraint
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Computing similarity of run-length encoded strings with affine gap penalty
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Information Processing and Management: an International Journal
Computing a longest increasing subsequence of length k in time O(n log log k)
VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Fast and cache-oblivious dynamic programming with local dependencies
LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Speeding up q-gram mining on grammar-based compressed texts
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Pattern discovery in annotated dialogues using dynamic programming
International Journal of Intelligent Information and Database Systems
Efficient LZ78 factorization of grammar compressed text
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
A divide and conquer approach and a work-optimal parallel algorithm for the LIS problem
Information Processing Letters
Journal of Discrete Algorithms
Hi-index | 0.01 |
Given two strings of size $n$ over a constant alphabet, the classical algorithm for computing the similarity between two sequences [D. Sankoff and J. B. Kruskal, eds., {Time Warps, String Edits, and Macromolecules}; Addison--Wesley, Reading, MA, 1983; T. F. Smith and M. S. Waterman, { J.\ Molec.\ Biol., 147 (1981), pp. 195--197] uses a dynamic programming matrix and compares the two strings in O(n2) time. We address the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both {local} and {global} similarity computations. The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel--Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an $O(n^2 / \log n)$, algorithm for an input of constant alphabet size. For most texts, the time complexity is actually $O(h n^2 / \log n)$, where $h \le 1$ is the entropy of the text. We also present an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity. This result extends to all distance or similarity scoring schemes that use an additive gap penalty.