Approximate all-pairs suffix/prefix overlaps

Authors:
Niko Välimäki;Susana Ladra;Veli Mäkinen
Affiliations:
Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of A Coruña, Spain;Department of Computer Science, University of Helsinki, Finland
Venue:
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Year:
2010

Citing 13
Cited 3

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A fast bit-vector algorithm for approximate string matching based on dynamic programming

Journal of the ACM (JACM)
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Bit-Parallel Witnesses and Their Applications to Approximate String Matching

Algorithmica
Indexing compressed text

Journal of the ACM (JACM)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed indexing and local alignment of DNA

Bioinformatics
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics
SOAP2

Bioinformatics

Approximate all-pairs suffix/prefix overlaps

Information and Computation
Computing the burrows-wheeler transform of a string and its reverse

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Computing the Burrows-Wheeler transform of a string and its reverse in parallel

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of r strings of total length n and an error-rate ε, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = ⌈εl⌉, where l is the length of the overlap. We propose new solutions for this problem based on backward backtracking (Lam et al. 2008) and suffix filters (Kärkkäinen and Na, 2008). Techniques use nHk + o(n log σ) + r log r bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, methods are easy to parallelize and scale up to millions of DNA reads.