Optimal prefix and suffix queries on texts
Information Processing Letters
Finding Patterns In Given Intervals
Fundamenta Informaticae
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Hi-index | 0.00 |
Indexing of factors or substrings is a widely used and useful technique in stringology and can be seen as a tool in solving diverse text algorithmic problems. A gapped-factor is a concatenation of a factor of length k, a gap of length d and another factor of length k′. Such a gapped factor is called a (k−d−k′)-gapped-factor. The problem of indexing the gapped-factors was considered recently by Peterlongo et al. (In: Stringology, pp. 182–196, 2006). In particular, Peterlongo et al. devised a data structure, namely a gapped factor tree (GFT) to index the gapped-factors. Given a text $\mathcal{T}$of length n over the alphabet Σ and the values of the parameters k, d and k′, the construction of GFT requires O(n|Σ|) time. Once GFT is constructed, a given (k−d−k′)-gapped-factor can be reported in O(k+k′+Occ) time, where Occ is the number of occurrences of that factor in $\mathcal{T}$. In this paper, we present a new improved indexing scheme for the gapped-factors. The improvements we achieve come from two aspects. Firstly, we generalize the indexing data structure in the sense that, unlike GFT, it is independent of the parameters k and k′. Secondly, our data structure can be constructed in O(nlog 1+ε n) time and space, where 0εn term, in the query time.