Finding level-ancestors in trees
Journal of Computer and System Sciences
Data compression via textual substitution
Journal of the ACM (JACM)
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression
CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices
SIAM Journal on Computing
The level ancestor problem simplified
Theoretical Computer Science - Latin American theorotical informatics
Real-Time Traversal in Grammar-Based Compressed Files
DCC '05 Proceedings of the Data Compression Conference
Linear work suffix array construction
Journal of the ACM (JACM)
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
Pattern matching in lempel-Ziv compressed strings: fast, simple, and deterministic
ESA'11 Proceedings of the 19th European conference on Algorithms
Fast q-gram mining on SLP compressed strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
Efficient LZ78 factorization of grammar compressed text
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP $\mathcal{T}$ of size n that represents string T, the algorithm computes the occurrence frequencies of allq-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size $m = |T|-\mathit{dup}(q,\mathcal{T})$, where $\mathit{dup}(q,\mathcal{T})$ is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m=O(qn), the running time of our algorithm is $O(\min\{|T|-\mathit{dup}(q,\mathcal{T}),qn\})$, improving our previous O(qn) algorithm when q=Ω(|T|/n).