A linear size index for approximate pattern matching

  • Authors:
  • Ho-Leung Chan;Tak-Wah Lam;Wing-Kin Sung;Siu-Lung Tam;Swee-Seong Wong

  • Affiliations:
  • Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, National University of Singapore, Singapore;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, National University of Singapore, Singapore

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper revisits the problem of indexing a text S[1..n] for pattern matching with up to k errors. A naive solution either has a worst-case matching time complexity of @W(m^k) or requires @W(n^k) space, where m is the length of the pattern. Devising a solution with better performance has been a challenge until Cole et al. (2004) [5] showed an O(nlog^kn)-space index that can support k-error matching in O(m+occ+log^knloglogn) time, where occ is the number of occurrences. Motivated by the indexing of long sequences like DNA, we have investigated the feasibility of devising a linear-size index that still has a time complexity linear in pattern length. This paper in particular presents an O(n)-space index that supports k-error matching in O(m+occ+(logn)^k^(^k^+^1^)loglogn) worst-case time. This index can be further compressed from O(n) words into O(n) bits with a slight increase in the time complexity.