Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
On the power of universal bases in sequencing by hybridization
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Sequencing-by-hybridization at the information-theory bound: an optimal algorithm
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Computing the Threshold for q-Gram Filters
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Approximate Pattern Matching with Samples
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Indexing Text with Approximate q-Grams
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
One-Gapped q-Gram Filtersfor Levenshtein Distance
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
Design and analysis of periodic multiple seeds
Theoretical Computer Science
Hi-index | 0.00 |
A popular and well-studied class of filters for approximate string matching compares substrings of length q, the q-grams, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses gapped q-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped q-grams. We show that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q-grams. To achieve these results the arrangement of the gaps in the q-gram and a filter parameter called threshold have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the k mismatches problem, i.e, approximate string matching with the Hamming distance.