Approximation algorithms for grammar-based compression

  • Authors:
  • Eric Lehman;Abhi Shelat

  • Affiliations:
  • MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA

  • Venue:
  • SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several recently-proposed data compression algorithms are based on the idea of representing a string by a context-free grammar. Most of these algorithms are known to be asymptotically optimal with respect to a stationary ergodic source and to achieve a low redundancy rate. However, such results do not reveal how effectively these algorithms exploit the grammar-model itself; that is, are the compressed strings produced as small as possible? We address this issue by analyzing the approximation ratio of several algorithms, that is, the maximum ratio between the size of the generated grammar and the smallest possible grammar over all inputs. On the negative side, we show that every polynomial-time grammar-compression algorithm has approximation ratio at least 8569/8568 unless P = NP. Moreover, achieving an approximation ratio of o(log n/log log n) would require progress on an algebraic problem in a well-studied area. We then upper and lower bound approximation ratios for the following four previously-proposed grammar-based compression algorithms: SEQUENTIAL, BISECTION, GREEDY, and LZ78, each of which employs a distinct approach to compression. These results seem to indicate that there is much room to improve grammar-based compression algorithms.