Surpassing the information theoretic bound with fusion trees
Journal of Computer and System Sciences - Special issue: papers from the 22nd ACM symposium on the theory of computing, May 14–16, 1990
A reliable randomized algorithm for the closest-pair problem
Journal of Algorithms
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Design of Dynamic Data Structures
Design of Dynamic Data Structures
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Representing Trees of Higher Degree
Algorithmica
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
On searching compressed string collections cache-obliviously
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient static trees and graphs
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Practical Rank/Select Queries over Arbitrary Sequences
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
The myriad virtues of Wavelet Trees
Information and Computation
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Position-Restricted substring searching
LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Universal codeword sets and representations of the integers
IEEE Transactions on Information Theory
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Journal of Discrete Algorithms
Hi-index | 0.00 |
An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence.