Approximate all-pairs suffix/prefix overlaps

Authors:
Niko Välimäki;Susana Ladra;Veli Mäkinen
Affiliations:
Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, P.O. Box 68, 00014, Finland;Department of Computer Science, University of A Coruña, Spain;Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, P.O. Box 68, 00014, Finland
Venue:
Information and Computation
Year:
2012

Citing 19
Cited 1

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A fast bit-vector algorithm for approximate string matching based on dynamic programming

Journal of the ACM (JACM)
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Bit-Parallel Witnesses and Their Applications to Approximate String Matching

Algorithmica
Indexing compressed text

Journal of the ACM (JACM)
The fragment assembly string graph

Bioinformatics
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed indexing and local alignment of DNA

Bioinformatics
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics
SOAP2

Bioinformatics
Efficient algorithms for the all-pairs suffix-prefix problem and the all-pairs substring-prefix problem

Information Processing Letters
Efficient construction of an assembly string graph using the FM-index

Bioinformatics
Approximate all-pairs suffix/prefix overlaps

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Unified view of backward backtracking in short read mapping

Algorithms and Applications

Least random suffix/prefix matches in output-sensitive time

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate @e, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k=@?@e@?@?, where @? is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Karkkainen and Na, 2008). Our technique uses nH"k+o(nlog@s)+rlogr bits of space, where H"k is the k-th order entropy and @s the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales up to millions of DNA reads.