Improved spill code generation for software pipelined loops

  • Authors:
  • Javier Zalamea;Josep Llosa;Eduard Ayguadé;Mateo Valero

  • Affiliations:
  • Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, cr. Jordi Girona 1-3, Mòdul D6, Campus Nord, 08034, Barcelona, SPAIN;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, cr. Jordi Girona 1-3, Mòdul D6, Campus Nord, 08034, Barcelona, SPAIN;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, cr. Jordi Girona 1-3, Mòdul D6, Campus Nord, 08034, Barcelona, SPAIN;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, cr. Jordi Girona 1-3, Mòdul D6, Campus Nord, 08034, Barcelona, SPAIN

  • Venue:
  • PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software pipelining is a loop scheduling technique that extractsparallelism out of loops by overlapping the execution of severalconsecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their execution. A schedule is valid if it requires at most the number of registers available in the target architecture. If not, its register requirementshave to be reduced either by decreasing the iteration overlapping or by spilling registers to memory. In this paper we describe a set of heuristics to increase the quality of register-constrained modulo schedules. The heuristics decide between the two previous alternatives and define criteria for effectively selecting spilling candidates. The heuristics proposed for reducing the register pressure can be applied to any software pipelining technique. The proposals are evaluated using a register-conscious software pipeliner on a workbench composed of a large set of loops from the Perfect Club benchmark and a set of processor configurations. Proposals in this paper are compared against a previous proposal already described in the literature. For one of these processor configurations and the set of loops that do not fit in the available registers (32), a speed-up of 1.68 and a reduction of the memory traffic by a factor of 0.57 are achieved with an affordable increase in compilation time. For all the loops, this represents a speed-up of 1.38 and a reduction of the memory traffic by a factor of 0.7.