Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
On the power of universal bases in sequencing by hybridization
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Sequencing-by-hybridization at the information-theory bound: an optimal algorithm
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Computing the Threshold for q-Gram Filters
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Approximate Pattern Matching with Samples
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
One-Gapped q-Gram Filtersfor Levenshtein Distance
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
A seriate coverage filtration approach for homology search
Proceedings of the 2004 ACM symposium on Applied computing
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimal spaced seeds for faster approximate string matching
Journal of Computer and System Sciences
Superiority of Spaced Seeds for Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hardness of optimal spaced seed design
Journal of Computer and System Sciences
Comparison of s-gram Proximity Measures in Out-of-Vocabulary Word Translation
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Reference-based alignment in large sequence databases
Proceedings of the VLDB Endowment
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Multimodal sn,k-grams: a skipping-based similarity model in information retrieval
ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
Improved fast similarity search in dictionaries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Spaced seeds design using perfect rulers
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Optimal spaced seeds for faster approximate string matching
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Hardness of optimal spaced seed design
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
A unifying framework for seed sensitivity and its application to subset seeds
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Detecting fuzzy amino acid tandem repeats in protein sequences
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Seed design framework for mapping SOLiD reads
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Hi-index | 0.00 |
A popular and well-studied class of filters for approximate string matching compares substrings of length q, the q-grams, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses gapped q-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped q-grams. We show that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q-grams. To achieve these results the arrangement of the gaps in the q-gram and a filter parameter called threshold have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the k mismatches problem, i.e, approximate string matching with the Hamming distance.