A new approach to text searching
Communications of the ACM
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Pattern Matching with Samples
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Indexing Text with Approximate q-Grams
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
A seriate coverage filtration approach for homology search
Proceedings of the 2004 ACM symposium on Applied computing
Approximate string matching with ordered q-grams
Nordic Journal of Computing
Estimating the selectivity of approximate string queries
ACM Transactions on Database Systems (TODS)
Optimal spaced seeds for faster approximate string matching
Journal of Computer and System Sciences
An Edit-Distance Model for the Approximate Matching of Timed Strings
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast bit-vector algorithms for approximate string matching under indel distance
SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
Hi-index | 0.00 |
In genomic databases, approximate string matching with k errors is often applied when searching genomic sequences, where k errors can be caused by substitution, insertion, or deletion operations. In this paper, we propose a new method, the hash trie filter, to efficiently support approximate string matching in genomic databases. First, we build a hash trie for indexing the genomic sequence stored in a database in advance. Then, we utilize an efficient technique to find the ordered subpatterns in the sequence, which could reduce the number of candidates by pruning some unreasonable matching positions. Moreover, our method will dynamically decide the number of ordered matching grams, resulting in the increase of precision. The simulation results show that the hash trie filter outperforms the well-known (k+s) q-samples filter in terms of the response time, the number of verified candidates, and the precision, under different lengths of the query patterns and different error levels.