Approximate all-pairs suffix/prefix overlaps

  • Authors:
  • Niko Välimäki;Susana Ladra;Veli Mäkinen

  • Affiliations:
  • Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of A Coruña, Spain;Department of Computer Science, University of Helsinki, Finland

  • Venue:
  • CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of r strings of total length n and an error-rate ε, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = ⌈εl⌉, where l is the length of the overlap. We propose new solutions for this problem based on backward backtracking (Lam et al. 2008) and suffix filters (Kärkkäinen and Na, 2008). Techniques use nHk + o(n log σ) + r log r bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, methods are easy to parallelize and scale up to millions of DNA reads.