Faster compressed dictionary matching

  • Authors:
  • Wing-Kai Hon;Tsung-Han Ku;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter

  • Affiliations:
  • Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA;Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA;Department of Computer Science, The University of Kansas, Lawrence, KS, USA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2013

Quantified Score

Hi-index 5.23

Visualization

Abstract

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size @s, Hon et al. (2008) [12] gave an nH"k(D)+o(nlog@s)-bit index which supports a query in O(|T|(log^@en+logd)+occ) time, where @e0 and H"k(D) denotes the kth-order entropy of D. Very recently, Belazzougui (2010) [3] has proposed an elegant scheme, which takes nlog@s+O(n) bits of index space and supports a query in optimal O(|T|+occ) time. In this paper, we provide connections between Belazzougui's index and the XBW compression of Ferragina and Manzini (2005) [8], and show that Belazzougui's index can be slightly modified to be stored in nH"k(D)+O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) [12] in both space and time.