Fast parallel and serial approximate string matching
Journal of Algorithms
Approximate string matching: a simpler faster algorithm
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Faster algorithms for string matching with k mismatches
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Provably sensitive Indexing strategies for biosequence similarity search
Proceedings of the sixth annual international conference on Computational biology
Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Efficient approximate and dynamic matching of patterns using a labeling paradigm
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
On spaced seeds for similarity search
Discrete Applied Mathematics
Estimating Seed Sensitivity on Homogeneous Alignments
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Graph connectivity, partial words, and a theorem of Fine and Wilf
Information and Computation
Hi-index | 0.00 |
Filtering is a standard technique for fast approximate string matching in practice.In filtering, a quick first step is used to rule out almost all positions of a text as possible starting positions for a pattern. Typically this step consists of finding the exact matches of small parts of the pattern. In the followup step, a slow method is used to verify or eliminate each remaining position. The running time of such a method depends largely on the quality of the filtering step, as measured by its false positives rate. The quality of such a method depends on the number of true matches that it misses, that is, on its false negative rate. A spaced seed is a recently introduced type of filter pattern that allows gaps (i.e. don’t cares) in the small sub-pattern to be searched for. Spaced seeds promise to yield a much lower false positives rate, and thus have been extensively studied, though heretofore only heuristically or statistically. In this paper, we show how to optimally design spaced seeds that yield no false negatives.