Faster compressed dictionary matching

  • Authors:
  • Wing-Kai Hon;Tsung-Han Ku;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter

  • Affiliations:
  • National Tsing Hua University, Taiwan;National Tsing Hua University, Taiwan;Louisiana State University, Louisiana;Louisiana State University, Louisiana;The University of Kansas, Kansas

  • Venue:
  • SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) gave an nHk(D) + o(n log σ)-bit index which supports a query in O(|T|(logε n + log d)+occ) time, where ε 0 and Hk(D) denotes the kth order entropy of D. Very recently, Belazzougui (2010) proposed an elegant scheme, which takes n log σ + O(n) bits of index space and supports a query in optimal O(|T|+occ) time. In this paper, we provide connections between Belazzougui's index and the XBW compression of Ferragina et al. (2005), and show that Belazzougui's index can be slightly modified to be stored in nHk(D)+O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) in both space and time.