Reduced code size modulo scheduling in the absence of hardware support

  • Authors:
  • Josep Llosa;Stefan M. Freudenberger

  • Affiliations:
  • Universitat Politècnica de Catalunya;Hewlett-Packard Laboratories, Cambridge, Mass

  • Venue:
  • Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modulo scheduling is a very effective instruction scheduling technique that exploits Instruction Level Parallelism (ILP) in loop bodies by overlapping the execution of successive iterations. Unfortunately, modulo scheduling has been shown to cause heavy code expansion. To avoid the penalties of code expansion, some processors have dedicated hardware support for modulo scheduled loops. However, this dedicated hardware support has a cost in chip area, cycle time, processor complexity, and compiler complexity.This paper shows that the right combination of scheduling heuristics combined with speculative modulo scheduling can significantly reduce code expansion. In addition, several code generation schema heuristics are proposed to further reduce code expansion. The evaluations show that loops can be effectively modulo scheduled with an average code expansion only 1.5 times the original loop size. Compared with a state of the art modulo scheduler, our code size sensitive heuristics reduce the size of embedded domain benchmarks binaries by 30% on average. While performance is mostly unchanged, some applications show speed-ups up to 20% due to a reduction in instruction cache capacity misses.