Approximating the smallest grammar: Kolmogorov complexity in natural models
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Learning Structure from Sequences, with Applications in a Digital Library
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Multiple Pattern Matching Algorithms on Collage System
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Compression of Biological Sequences by Greedy Off-Line Textual Substitution
DCC '00 Proceedings of the Conference on Data Compression
Compressed Pattern Matching for Sequitur
DCC '01 Proceedings of the Data Compression Conference
Optimization of html automatically generated by wysiwyg programs
Proceedings of the 13th international conference on World Wide Web
Music information retrieval research and its context at the University of Waikato
Journal of the American Society for Information Science and Technology - Music information retrieval
Block merging for off-line compression
Journal of the American Society for Information Science and Technology
A Run-Time Efficient Implementation of Compressed Pattern Matching Automata
CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
Experiences with model inference assisted fuzzing
WOOT'08 Proceedings of the 2nd conference on USENIX Workshop on offensive technologies
Reducing Space Requirements for Disk Resident Suffix Arrays
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
PPM with the extended alphabet
Information Sciences: an International Journal
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
Improving semistatic compression via phrase-based modeling
Information Processing and Management: an International Journal
Faster subsequence and don't-care pattern matching on compressed texts
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Fast q-gram mining on SLP compressed strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Reference sequence construction for relative compression of genomes
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Iterative Dictionary Construction for Compression of Large DNA Data Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections
Proceedings of the VLDB Endowment
Functional programs as compressed data
PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
Phrase-Based pattern matching in compressed text
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
An efficient pattern matching algorithm on a subclass of context free grammars
DLT'04 Proceedings of the 8th international conference on Developments in Language Theory
Choosing word occurrences for the smallest grammar problem
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Grammar-based compression in a streaming model
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
VISION: cloud-powered sight for all: showing the cloud what you see
Proceedings of the third ACM workshop on Mobile cloud computing and services
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Speeding up q-gram mining on grammar-based compressed texts
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Compressed text indexes with fast locate
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Efficient LZ78 factorization of grammar compressed text
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Compressed representation of web and social networks via dense subgraphs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Fast q-gram mining on SLP compressed strings
Journal of Discrete Algorithms
Proceedings of the 16th International Conference on Database Theory
Space-efficient data structures for Top-k completion
Proceedings of the 22nd international conference on World Wide Web
Compressed automata for dictionary matching
CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata
XML tree structure compression using RePair
Information Systems
FRESCO: Referential Compression of Highly Similar Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Dictionary-based modelling is the mechanism used in many practical compression schemes. For example, the members of the two Ziv-Lempel families parse the input message into a sequence of phrases selected from a dictionary, and obtain compression since a reference to the phrase can be more compact than the phrase itself.In most implementations of dictionary-based compression the encoder operates online, incrementally inferring its dictionary of available phrases from previous parts of the message, and adjusting its dictionary after the transmission of each phrase. Doing so allows the dictionary to be transmitted implicitly, since the decoder simultaneously makes similar adjustments to its dictionary.An alternative approach { the topic explored in this paper { is to use the full message (or a large block of it) to infer a complete dictionary in advance, and include an explicit representation of the dictionary as part of the compressed message. Intuitively, the advantage of this offline approach is that with the benefit of having access to all of the message, it should be possible to optimize the choice of phrases so as to maximize compression performance. Indeed, we demonstrate that very good compression can be attained by an offline method without compromising the fast decoding that is a distinguishing characteristic of dictionary-based techniques.Several nontrivial sources of overhead { in terms of both computation resources required to perform the compression, and bits generated into the compressed message { have to be carefully managed as part of the offline process. To meet this challenge, we have developed a novel phrase derivation method and a compact dictionary encoding. In combination these two techniques produce the compression scheme repair, which is highly efficient, particularly in decompression.It should also be noted that while offline compression involves the disadvantage of having to store a large part of the message in memory for processing, the difference between doing this and storing the growing dictionary of an online compressor is illusory. Indeed, incremental dictionary-based algorithms maintain an equally large part of the message in memory as part of the dictionary; similarly, online predictive symbol-based context models occupy space that may be linear in the size of that part of the message on which prediction is based.Our scheme is offline only while inferring the dictionary, and during decompression bits are read and phrases written in a fully interleaved manner. Moreover, during decoding only a compact representation of the dictionary must be stored. Thus, during decompression, our approach has a space advantage over both incremental dictionary-based schemes and over context-based source models.