Split-stream dictionary program compression

  • Authors:
  • Steven Lucco

  • Affiliations:
  • Transmeta, 3940 Freedom Circle, Santa Clara, CA

  • Venue:
  • PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes split-stream dictionary (SSD) compression, a new technique for transforming programs into a compact, interpretable form. We define a compressed program as interpretable when it can be decompressed at basic-block granularity with reasonable efficiency. The granularity requirement enables interpreters or just-in-time (JIT) translators to decompress basic blocks incrementally during program execution. Our previous approach to interpretable compression, the Byte-coded RISC (BRISC) program format [1], achieved unprecedented decompression speed in excess of 5 megabytes per second on a 450MHz Pentium II while compressing benchmark programs to an average of three-fifths the size of their optimized x86 representation. SSD compression combines the key idea behind BRISC with new observations about instruction re-use frequencies to yield four advantages over BRISC and other competing techniques. First, SSD is simple, requiring only a few pages of code for an effective implementation. Second, SSD compresses programs more effectively than any interpretable program compression scheme known to us. For example, SSD compressed a set of programs including the spec95 benchmarks and Microsoft Word97 to less than half the size, on average, of their optimized x86 representation. Third, SSD exceeds BRISC's decompression and JIT translation rates by over 50%. Finally, SSD's two-phased approach to JIT translation enables a virtual machine to provide graceful degradation of program execution time in the face of increasing RAM constraints. For example, using SSD, we ran Word97 using a JIT-translation buffer one-third the size of Word97's optimized x86 code, yet incurred only 27% execution time overhead.