Indexing with gaps

  • Authors:
  • Moshe Lewenstein

  • Affiliations:
  • Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel

  • Venue:
  • SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In Indexing with Gaps one seeks to index a text to allow pattern queries that allow gaps within the pattern query. Formally a gappedpattern over alphabet Σ is a pattern of the form p = p1g1p2g2 ... glpl+1, where ∀i, pi ∈ Σ* and each gi is a gap length ∈ N. Often one considers these patterns with some bound constraints, for example, all gaps are bounded by a gap-bound G. Near-optimal solutions have, lately, been proposed for the case of one gap only with a predetermined size. More specifically, an indexing solution for patterns of the form p1 ċ g ċ p2, where g is known apriori. In this case the solutions mentioned are preprocessed in O(n log∈ n) time and O(n) space, where the pattern queries are answered in O(|p1| + |p2|), for constant sized alphabets. For the more general case when there is a bound G these results can be easily adapted with a multiplicative factor of O(G) for the preprocessing, i.e. O(n log∈ nG) preprocessing time and O(nG) preprocessing space. Alas, these solutions do not lend to more than one gap. In this paper we propose a solution for k gaps one with preprocessing time O(nG2k logk n log log n) and space of O(nG2k logk n) and query time O(m + 2k log log n), where m = Σi=1 |pi|.