Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract)
SWAT '96 Proceedings of the 5th Scandinavian Workshop on Algorithm Theory
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression
CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Fast and space efficient string kernels using suffix arrays
ICML '06 Proceedings of the 23rd international conference on Machine learning
ACM Computing Surveys (CSUR)
Efficient algorithms to compute compressed longest common substrings and compressed palindromes
Theoretical Computer Science
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Fast q-gram mining on SLP compressed strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
An Online Algorithm for Lightweight Grammar-Based Compression
CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
Processing compressed texts: a tractability border
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.