Least random suffix/prefix matches in output-sensitive time

Authors:
Niko Välimäki
Affiliations:
Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki, Finland
Venue:
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Year:
2012

Citing 13
Cited 0

Algorithms for approximate string matching

Information and Control
On finding lowest common ancestors: simplification and parallelization

SIAM Journal on Computing
An efficient algorithm for the All Pairs Suffix-Prefix Problem

Information Processing Letters
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Incremental String Comparison

SIAM Journal on Computing
Deterministic dictionaries

Journal of Algorithms
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Efficient algorithms for the all-pairs suffix-prefix problem and the all-pairs substring-prefix problem

Information Processing Letters
A linear size index for approximate pattern matching

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Approximate all-pairs suffix/prefix overlaps

Information and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of finding suffix/prefix matches (overlaps) when given a set of r strings of total length n. Gusfield et al. (1992) gave an algorithm to find the longest exact overlaps between all string-pairs in the optimal O(n+toutput) time, where toutput≤r2 is the number of non-zero length overlaps found. So far the best worst-case time for finding approximate overlaps within edit distance k has been O(knr) (Landau et al. 1998), which gives Ω(r2) time regardless of the output size. We propose the first output-sensitive algorithm to find either the longest or the least random approximate overlaps. Given the maximum edit distance k allowed in an overlap, the approximate overlaps can be found in linear space and in O((n+toutput) polylog(n)) time for any constant k. If all input strings are shorter than $\log n/(k^\frac{1}{k}\sigma)$, we achieve the time complexity O(n logkn+toutput) for any k. For strings longer than εlogkr, we improve the previous best worst-case time from O(knr) to $O(\frac{c^k}{k!}nr)$ for moderate k and constants c1 and ε0.