A linear size index for approximate pattern matching

Authors:
Ho-Leung Chan;Tak-Wah Lam;Wing-Kin Sung;Siu-Lung Tam;Swee-Seong Wong
Affiliations:
Department of Computer Science, University of Hong Kong;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore
Venue:
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Year:
2006

Citing 11
Cited 9

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Indexing and Dictionary Matching with One Error

WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Range Searching Over Tree Cross Products

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Improved approximate string matching using compressed suffix data structures

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation

Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed indexes for approximate string matching

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Pattern matching with don't cares and few errors

Journal of Computer and System Sciences
Approximate string matching with Lempel-Ziv compressed indexes

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
A filtering algorithm for k-mismatch with don't cares

Information Processing Letters
Fast index for approximate string matching

Journal of Discrete Algorithms
Approximate String Processing

Foundations and Trends in Databases
Least random suffix/prefix matches in output-sensitive time

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Cache-oblivious index for approximate string matching

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(mk) or requires Ω(nk) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n logkn)-space index that can support k-error matching in O(m + occ + logkn loglogn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ + (logn)$^{k({\it k}+1)}$ loglogn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.