Hierarchical tiling for improved superscalar performance

  • Authors:
  • Larry Carter;Jeanne Ferrante;Susan Flynn Hummel

  • Affiliations:
  • -;-;-

  • Venue:
  • IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

It takes more than a good algorithm to achieve high performance: inner-loop performance and data locality are also important. Tiling is a well-known method for parallelization and for improving data locality. However, tiling has the potential of being even more beneficial. At the finest granularity, it can be used to guide register allocation and instruction scheduling; at the coarsest level, it can help manage magnetic storage media. It also can be useful in overlapping data movement with computation, for instance by prefetching data from archival storage, disks and main memory into cache and registers, or by choreographing data movement between processors. Hierarchical tiling is a framework for applying both known tiling methods and new techniques to an expanded set of uses. It eases the burden on several compiler phases that are traditionally treated separately, such as scalar replacement, register allocation, generation of message passing calls, and storage mapping. By explicitly naming and copying data, it takes control of the mapping of data to memory and of the movement of data between processing elements and up and down the memory hierarchy. This paper focuses on using hierarchical tiling to exploit superscalar pipelined processors. On a simple example, it improves performance by a factor of 3, achieving perfect use of the superscalar processor's pipeline. Hierarchical tiling is presented here as a method of hand-tuning performance; while outside the scope of this paper, the ideas can be incorporated into an automatic preprocessor or optimizing compiler.