Compressed indexes for approximate string matching

  • Authors:
  • Ho-Leung Chan;Tak-Wah Lam;Wing-Kin Sung;Siu-Lung Tam;Swee-Seong Wong

  • Affiliations:
  • Department of Computer Science, University of Hong Kong;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore

  • Venue:
  • ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

We revisit the problem of indexing a string S[1..n] to support searching all substrings in S that match a given pattern P[1..m] with at most k errors. Previous solutions either require an index of size exponential in k or need Ω(mk) time for searching. Motivated by the indexing of DNA sequences, we investigate space efficient indexes that occupy only O(n) space. For k = 1, we give an index to support matching in O(m + occ + logn loglogn) time. The previously best solution achieving this time complexity requires an index of size O(n logn). This new index can be used to improve existing indexes for k ≥2 errors. Among others, it can support matching with k=2 errors in O(m logn loglogn + occ) time.