Automatic inference of models for statistical code compression

  • Authors:
  • Christopher W. Fraser

  • Affiliations:
  • Microsoft Research, One Microsoft Way, Redmond, WA

  • Venue:
  • Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc manner. A program accepts a large training set of program material in a conventional compiler intermediate representation (IR) and automatically infers a decision tree that separates IR code into streams that compress much better than the undifferentiated whole. Driving a conventional arithmetic compressor with this model yields code 30% smaller than the previous record for IR code compression, and 24% smaller than an ambitious optimizing compiler feeding an ambitious general-purpose data compressor.