Improved string matching with k mismatches
ACM SIGACT News
Efficient string matching with k mismatches
Theoretical Computer Science
SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm
Communications of the ACM
Introduction to algorithms
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Approximate Multiple Strings Search
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Efficient VLSI Implementation of Modulo (2^n=B11) Addition and Multiplication
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Faster algorithms for string matching with k mismatches
Journal of Algorithms - Special issue: SODA 2000
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Approximate string matching using compressed suffix arrays
Theoretical Computer Science
Efficient string matching in the presence of errors
SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Bioinformatics
The Sequence Alignment/Map format and SAMtools
Bioinformatics
A fast algorithm for approximate string matching on gene sequences
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
ERNE-BS5: aligning BS-treated sequences by multiple hits on a 5-letters alphabet
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
With the advent of new sequencing technologies able to produce an enormous quantity of short genomic sequences, new tools able to search for them inside a genomic reference sequence have emerged. Because of chemical reading errors or of the variability between organisms, one is interested in finding not only exact occurrences, but also occurrences with up to k mismatches. The contribution of this paper is twofold. On the one hand, we present a generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity O(n+m) (n text and m pattern lengths, respectively). On the other hand, we show how to employ this idea in conjunction with an index over the text, allowing to search a pattern, with up to k mismatches, in time proportional to its length. This novel tool-rNA (randomized Numerical Aligner)-is in general faster and more accurate than other available tools like SOAP2, BWA, and BOWTIE. rNA executables and source code are freely available at http://iga-rna.sourceforge.net/.