String indexing for patterns with wildcards

  • Authors:
  • Philip Bille;Inge Li Gørtz;Hjalte Wedel Vildhøj;Søren Vind

  • Affiliations:
  • DTU Informatics, Technical University of Denmark, Denmark;DTU Informatics, Technical University of Denmark, Denmark;DTU Informatics, Technical University of Denmark, Denmark;DTU Informatics, Technical University of Denmark, Denmark

  • Venue:
  • SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. – A linear space index with query time O(m+σj loglogn+occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case. – An index with query time O(m+j+occ) using space $O(\sigma^{k^2} n \log^k\log n)$, where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. – A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.