A linear size index for approximate pattern matching

Authors:
Ho-Leung Chan;Tak-Wah Lam;Wing-Kin Sung;Siu-Lung Tam;Swee-Seong Wong
Affiliations:
Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, National University of Singapore, Singapore;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, National University of Singapore, Singapore
Venue:
Journal of Discrete Algorithms
Year:
2011

Citing 12
Cited 1

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Text indexing and dictionary matching with one error

Journal of Algorithms
Range Searching Over Tree Cross Products

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Text indexing with errors

Journal of Discrete Algorithms
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Improved Approximate String Matching Using Compressed Suffix Data Structures

Algorithmica
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

String indexing for patterns with wildcards

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper revisits the problem of indexing a text S[1..n] for pattern matching with up to k errors. A naive solution either has a worst-case matching time complexity of @W(m^k) or requires @W(n^k) space, where m is the length of the pattern. Devising a solution with better performance has been a challenge until Cole et al. (2004) [5] showed an O(nlog^kn)-space index that can support k-error matching in O(m+occ+log^knloglogn) time, where occ is the number of occurrences. Motivated by the indexing of long sequences like DNA, we have investigated the feasibility of devising a linear-size index that still has a time complexity linear in pattern length. This paper in particular presents an O(n)-space index that supports k-error matching in O(m+occ+(logn)^k^(^k^+^1^)loglogn) worst-case time. This index can be further compressed from O(n) words into O(n) bits with a slight increase in the time complexity.