Indexing Factors with Gaps

  • Authors:
  • Costas S. Iliopoulos;M. Sohel Rahman

  • Affiliations:
  • King’s College London, Algorithm Design Group, Department of Computer Science, Strand, WC2R 2LS, London, England, UK;King’s College London, Algorithm Design Group, Department of Computer Science, Strand, WC2R 2LS, London, England, UK

  • Venue:
  • Algorithmica
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Indexing of factors or substrings is a widely used and useful technique in stringology and can be seen as a tool in solving diverse text algorithmic problems. A gapped-factor is a concatenation of a factor of length k, a gap of length d and another factor of length k′. Such a gapped factor is called a (k−d−k′)-gapped-factor. The problem of indexing the gapped-factors was considered recently by Peterlongo et al. (In: Stringology, pp. 182–196, 2006). In particular, Peterlongo et al. devised a data structure, namely a gapped factor tree (GFT) to index the gapped-factors. Given a text $\mathcal{T}$of length n over the alphabet Σ and the values of the parameters k, d and k′, the construction of GFT requires O(n|Σ|) time. Once GFT is constructed, a given (k−d−k′)-gapped-factor can be reported in O(k+k′+Occ) time, where Occ is the number of occurrences of that factor in  $\mathcal{T}$. In this paper, we present a new improved indexing scheme for the gapped-factors. The improvements we achieve come from two aspects. Firstly, we generalize the indexing data structure in the sense that, unlike GFT, it is independent of the parameters k and k′. Secondly, our data structure can be constructed in O(nlog 1+ε n) time and space, where 0εn term, in the query time.