Register tiling in nonrectangular iteration spaces

  • Authors:
  • Marta Jiménez;José M. Llabería;Agustín Fernández

  • Affiliations:
  • Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain

  • Venue:
  • ACM Transactions on Programming Languages and Systems (TOPLAS)
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Loop tiling is a well-known loop transformation generally used to expose coarse-grain parallelism and to exploit data reuse at the cache level. Tiling can also be used to exploit data reuse at the register level and to improve a program's ILP. However, previous proposals in the literature (as well as commercial compilers) are only able to perform multidimensional tiling for the register level when the iteration space is rectangular. In this article we present a new general algorithm to perform multidimensional tiling for the register level in both rectangular and nonrectangular iteration spaces. We also propose a simple heuristic to determine the tiling parameters at this level. Finally, we evaluate our method using as benchmarks typical linear algebra algorithms having nonrectangular iteration spaces and compare our proposal against hand-optimized vendor-supplied numerical libraries and against commercial compilers able to perform optimizing code transformations such as inner unrolling, unroll-and-jam, and software pipelining. Measurements were taken on three different superscalar microprocessors. Results will show that our method outperforms the native compilers (showing speedups of 2.5 in average) and matches the performance of vendor-supplied numerical libraries. The general conclusion is that compiler technology can make it possible for nonrectangular loop nests to achieve as high performance as hand-optimized codes.