On position restricted substring searching in succinct space

  • Authors:
  • Wing-Kai Hon;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter

  • Affiliations:
  • Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA;Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA;Department of Electrical Engineering and Computer Science, University of Kansas, USA

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the position restricted substring searching (PRSS) problem, where the task is to index a text T[0...n-1] of n characters over an alphabet set @S of size @s, in order to answer the following: given a query pattern P (of length p) and two indices @? and r, report all occ"@?","r occurrences of P in T[@?...r]. Known indexes take O(nlogn) bits or O(nlog^1^+^@en) bits space, and answer this query in O(p+logn+occ"@?","rlogn) time or in optimal O(p+occ"@?","r) time respectively, where @e is any positive constant. The main drawback of these indexes is their space requirement of @W(nlogn) bits, which can be much more than the optimal nlog@s bits to store the text T. This paper addresses an open question asked by Makinen and Navarro [LATIN, 2006], which is whether it is possible to design a succinct index answering PRSS queries efficiently. We first study the hardness of this problem and prove the following result: a succinct (or a compact) index cannot answer PRSS queries efficiently in the pointer machine model, and also not in the RAM model unless bounds on the well-researched orthogonal range query problem improve. However, for the special case of sufficiently long query patterns, that is for p=@W(log^2^+^@en), we derive an |CSA"f|+|CSA"r|+o(n) bits index with optimal query time, where |CSA"f| and |CSA"r| are the space (in bits) of the compressed suffix arrays (with O(p) time for pattern search) of T and T