Elements of information theory
Elements of information theory
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling algorithms: lower bounds and applications
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Approximation algorithms for grammar-based compression
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Using Literal and Grammatical Statistics for Authorship Attribution
Problems of Information Transmission
DNA Sequence Classification Using Compression-Based Induction
DNA Sequence Classification Using Compression-Based Induction
The Complexity of Approximating the Entropy
SIAM Journal on Computing
Similarity of objects and the meaning of words
TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
SIAM Journal on Discrete Mathematics
Hi-index | 0.00 |
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and Lempel-Ziv (LZ), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly.Our investigation of LZ yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to Lempel-Ziv to the number of distinct short substrings contained in it. In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution.