Improved approximate string matching using compressed suffix data structures

Authors:
Tak-Wah Lam;Wing-Kin Sung;Swee-Seong Wong
Affiliations:
Department of Computer Science, The University of HongKong, HongKong;School of Computing, National University of Singapore, Singapore;School of Computing, National University of Singapore, Singapore
Venue:
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Year:
2005

Citing 9
Cited 6

Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Text indexing and dictionary matching with one error

Journal of Algorithms
Space efficient suffix trees

Journal of Algorithms
Time-space trade-offs for compressed suffix arrays

Information Processing Letters
Indexing Text with Approximate q-Grams

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Approximate String-Matching over Suffix Trees

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Range Searching Over Tree Cross Products

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems

Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed indexes for approximate string matching

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Approximate string matching with Lempel-Ziv compressed indexes

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Fast index for approximate string matching

Journal of Discrete Algorithms
A linear size index for approximate pattern matching

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Cache-oblivious index for approximate string matching

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an $O(n\sqrt{{\rm log} n})$-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n + occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2n) bits. The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of logεn, for 0 ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A|kmk(k+log log n) + occ) and O(logεn (|A|kmk(k+log log n) + occ)) query time using an $O(n\sqrt{{\rm log} n})$-bit and an O(n)-bit indexing data structures, respectively.