Indexing Text Using the Ziv-Lempel Trie
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
A linear lower bound on index size for text retrieval
Journal of Algorithms - Special issue: Twelfth annual ACM-SIAM symposium on discrete algorithms
A categorization theorem on suffix arrays with applications to space efficient text indexes
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
Opportunistic data structures for range queries
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Space-efficient construction of LZ-index
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Hi-index | 0.00 |
In this paper we design two compressed data structures for the full-text indexing problem. These data structures support efficient substring searches within the indexed text $T$ using roughly the same space required to store $T$ in compressed form. Our first compressed data structure retrieves the $occ$ occurrences of a pattern $P[1,p]$ in $T[1,n]$ in $O(p + occ\log^{1+\epsilon} n)$ time and uses at most $5n H_k(T) + o(n)$ bits of storage, where $H_k(T)$ is the $k$-th order empirical entropy of $T$. This space occupancy is $Theta(n)$ bits in the worst case and $o(n)$ bits for compressible texts. Our data structure exploits the relationship between the suffix array and the Burrows-Wheeler compression algorithm. Our second compressed data structure achieves $O(p+occ)$ query time and uses $O(n H_k(T)\log^\epsilon n) + o(n)$ bits of storage. In the worst case the space occupancy is $o(n\log n)$ bits which is asymptotically smaller than the space occupancy of suffix trees and suffix arrays. This second data structure exploits the interplay between two compressors: the Burrows-Wheeler algorithm and the LZ78 algorithm.