Position-Restricted substring searching

Authors:
Veli Mäkinen;Gonzalo Navarro
Affiliations:
Department of Computer Science, University of Helsinki, Finland;Center for Web Research, Dept. of Computer Science, University of Chile, Chile
Venue:
LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Year:
2006

Citing 12
Cited 20

Functional approach to data structures and its use in multidimensional searching

SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Compact pat trees

Compact pat trees
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
New data structures for orthogonal range searching

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
When indexing equals compression: experiments with compressing suffix arrays and applications

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing compressed text

Journal of the ACM (JACM)
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science

Optimal prefix and suffix queries on texts

Information Processing Letters
Generalized Substring Compression

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Efficient Index for Retrieving Top-k Most Frequent Documents

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Range Non-overlapping Indexing

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Efficient index for retrieving top-k most frequent documents

Journal of Discrete Algorithms
Finding Patterns In Given Intervals

Fundamenta Informaticae
Substring range reporting

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
New algorithms on wavelet trees and applications to information retrieval

Theoretical Computer Science
Improved algorithms for the range next value problem and applications

Theoretical Computer Science
The wavelet trie: maintaining an indexed sequence of strings in compressed space

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Wavelet trees for all

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Computing lempel-ziv factorization online

MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
Finding patterns in given intervals

MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science
The wavelet matrix

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Succinct representations of weighted trees supporting path queries

Journal of Discrete Algorithms
On position restricted substring searching in succinct space

Journal of Discrete Algorithms
Extracting powers and periods in a word from its runs structure

Theoretical Computer Science
Wavelet trees for all

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occl,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occl,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog$^{\rm 1+{\it \epsilon}}$n) bits of space for any constant ε 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible. Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries. As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.'s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle's data structure.