Least random suffix/prefix matches in output-sensitive time

  • Authors:
  • Niko Välimäki

  • Affiliations:
  • Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki, Finland

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of finding suffix/prefix matches (overlaps) when given a set of r strings of total length n. Gusfield et al. (1992) gave an algorithm to find the longest exact overlaps between all string-pairs in the optimal O(n+toutput) time, where toutput≤r2 is the number of non-zero length overlaps found. So far the best worst-case time for finding approximate overlaps within edit distance k has been O(knr) (Landau et al. 1998), which gives Ω(r2) time regardless of the output size. We propose the first output-sensitive algorithm to find either the longest or the least random approximate overlaps. Given the maximum edit distance k allowed in an overlap, the approximate overlaps can be found in linear space and in O((n+toutput) polylog(n)) time for any constant k. If all input strings are shorter than $\log n/(k^\frac{1}{k}\sigma)$, we achieve the time complexity O(n logkn+toutput) for any k. For strings longer than εlogkr, we improve the previous best worst-case time from O(knr) to $O(\frac{c^k}{k!}nr)$ for moderate k and constants c1 and ε0.