A linear size index for approximate pattern matching

  • Authors:
  • Ho-Leung Chan;Tak-Wah Lam;Wing-Kin Sung;Siu-Lung Tam;Swee-Seong Wong

  • Affiliations:
  • Department of Computer Science, University of Hong Kong;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore

  • Venue:
  • CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(mk) or requires Ω(nk) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n logkn)-space index that can support k-error matching in O(m + occ + logkn loglogn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ + (logn)$^{k({\it k}+1)}$ loglogn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.