Compressed Index for Dictionary Matching

  • Authors:
  • Wing-Kai Hon;Tak-Wah Lam;Rahul Shah;Siu-Lung Tam;Jeffrey Scott Vitter

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • DCC '08 Proceedings of the Data Compression Conference
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The past few years have witnessed several exciting results on compressed representation of a string T that supports efficient pattern matching, and the space complexity has been reduced to |T|H_k(T) + o(|T| log s) bits, where H_k(T) denotes the kth-order empirical entropy of T, and s is the size of the alphabet. In this paper, we study compressed representation of another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found efficiently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an (nH_k(D) + o(n log s))-bit index with only a small sacrifice in search time.