Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Computing the Threshold for q-Gram Filters
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Computing the Threshold for q-Gram Filters
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Extending q-grams to estimate selectivity of string matching with low edit distance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Indexing methods for approximate dictionary searching: Comparative analysis
Journal of Experimental Algorithmics (JEA)
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Better Filtering with Gapped q-Grams
Fundamenta Informaticae - Computing Patterns in Strings
Hi-index | 0.00 |
We have recently shown that q-gram filters based on gapped q-grams instead of the usual contiguous q-grams can provide orders of magnitude faster and/or more efficient filtering for the Hamming distance. In this paper, we extend the results for the Levenshtein distance, which is more problematic for gapped q-grams because an insertion or deletion in a gap affects a q-gram while a replacement does not. To keep this effect under control, we concentrate on gapped q-grams with just one gap. We demostrate with experiments that the resulting filters provide a significant improvement over the contiguous q-gram filters. We also develop new techniques for dealing with complex q-gram filters.