Cache-oblivious index for approximate string matching

  • Authors:
  • Wing-Kai Hon;Tak-Wah Lam;Rahul Shah;Siu-Lung Tam;Jeffrey Scott Vitter

  • Affiliations:
  • Department of Computer Science, National Tsing Hua University, Taiwan;Department of Computer Science, The University of Hong Kong, Hong Kong;Department of Computer Sciences, Purdue University, Indiana;Department of Computer Science, The University of Hong Kong, Hong Kong;Department of Computer Sciences, Purdue University, Indiana

  • Venue:
  • CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((n logk n)/B) disk pages and finds all k-error matches with O((|P| + occ)/B + logk n log logB n) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P| + occ + poly(log n)) I/Os. The second index reduces the space to O((n log n)/B) disk pages, and the I/O complexity is O((|P| + occ)/B + logk(k+1) nlog log n).