PPMexe: Program compression

  • Authors:
  • Milenko Drinić;Darko Kirovski;Hoi Vo

  • Affiliations:
  • Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA

  • Venue:
  • ACM Transactions on Programming Languages and Systems (TOPLAS)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the emergence of software delivery platforms, code compression has become an important system component that strongly affects performance. This article presents PPMexe, a compression mechanism for program binaries that analyzes their syntax and semantics to achieve superior compression ratios. We use the generic paradigm of prediction by partial matching (PPM) as the foundation of our compression codec. PPMexe combines PPM with two preprocessing steps: (i) instruction rescheduling to improve prediction rates and (ii) heuristic partitioning of a program binary into streams with high autocorrelation. We improve the traditional PPM algorithm by (iii) using an additional alphabet of frequent variable-length supersymbols extracted from the input stream of fixed-length symbols. In addition, PPMexe features (iv) a low-overhead mechanism that enables decompression starting from an arbitrary instruction of the executable, a property pivotal for runtime software delivery. We implemented PPMexe for x86 binaries and tested it on several large applications. Binaries compressed using PPMexe were 18--24% smaller than files created using off-the-shelf PPMD, one of the best available compressors