STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Text indexing and dictionary matching with one error
Journal of Algorithms
Journal of Algorithms
Time-space trade-offs for compressed suffix arrays
Information Processing Letters
Indexing Text with Approximate q-Grams
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Range Searching Over Tree Cross Products
ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
ACM Computing Surveys (CSUR)
Compressed indexes for approximate string matching
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Approximate string matching with Lempel-Ziv compressed indexes
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Fast index for approximate string matching
Journal of Discrete Algorithms
A linear size index for approximate pattern matching
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Cache-oblivious index for approximate string matching
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an $O(n\sqrt{{\rm log} n})$-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n + occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2n) bits. The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of logεn, for 0 ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A|kmk(k+log log n) + occ) and O(logεn (|A|kmk(k+log log n) + occ)) query time using an $O(n\sqrt{{\rm log} n})$-bit and an O(n)-bit indexing data structures, respectively.