Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Text indexing and dictionary matching with one error
Journal of Algorithms
Range Searching Over Tree Cross Products
ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Journal of Discrete Algorithms
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
String indexing for patterns with wildcards
SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Hi-index | 0.00 |
This paper revisits the problem of indexing a text S[1..n] for pattern matching with up to k errors. A naive solution either has a worst-case matching time complexity of @W(m^k) or requires @W(n^k) space, where m is the length of the pattern. Devising a solution with better performance has been a challenge until Cole et al. (2004) [5] showed an O(nlog^kn)-space index that can support k-error matching in O(m+occ+log^knloglogn) time, where occ is the number of occurrences. Motivated by the indexing of long sequences like DNA, we have investigated the feasibility of devising a linear-size index that still has a time complexity linear in pattern length. This paper in particular presents an O(n)-space index that supports k-error matching in O(m+occ+(logn)^k^(^k^+^1^)loglogn) worst-case time. This index can be further compressed from O(n) words into O(n) bits with a slight increase in the time complexity.