Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression
CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
ACM Computing Surveys (CSUR)
Efficient algorithms to compute compressed longest common substrings and compressed palindromes
Theoretical Computer Science
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
IEEE Transactions on Information Theory
Processing compressed texts: a tractability border
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Computing q-gram non-overlapping frequencies on SLP compressed texts
SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Speeding up q-gram mining on grammar-based compressed texts
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Efficient LZ78 factorization of grammar compressed text
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Variable-Length codes for space-efficient grammar-based compression
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Fast q-gram mining on SLP compressed strings
Journal of Discrete Algorithms
Hi-index | 0.00 |
We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.