Theoretical Computer Science
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Compressed string-matching in standard Sturmian words
Theoretical Computer Science
Self-indexed Text Compression Using Straight-Line Programs
MFCS '09 Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science 2009
Towards approximate matching in compressed strings: local subsequence recognition
CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
Window subsequence problems for compressed texts
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Querying and embedding compressed texts
MFCS'06 Proceedings of the 31st international conference on Mathematical Foundations of Computer Science
Variable-Length codes for space-efficient grammar-based compression
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) T of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nm log m) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.