Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed indexing and local alignment of DNA
Bioinformatics
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Bioinformatics
Approximate all-pairs suffix/prefix overlaps
Information and Computation
Computing the burrows-wheeler transform of a string and its reverse
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Computing the Burrows-Wheeler transform of a string and its reverse in parallel
Journal of Discrete Algorithms
Hi-index | 0.00 |
Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of r strings of total length n and an error-rate ε, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = ⌈εl⌉, where l is the length of the overlap. We propose new solutions for this problem based on backward backtracking (Lam et al. 2008) and suffix filters (Kärkkäinen and Na, 2008). Techniques use nHk + o(n log σ) + r log r bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, methods are easy to parallelize and scale up to millions of DNA reads.