Memory reuse optimizations in the R-Stream compiler

  • Authors:
  • Nicolas Vasilache;Muthu Baskaran;Benoit Meister;Richard Lethin

  • Affiliations:
  • Reservoir Labs Inc., New-York, NY;Reservoir Labs Inc., New-York, NY;Reservoir Labs Inc., New-York, NY;Reservoir Labs Inc., New-York, NY

  • Venue:
  • Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new set of automated techniques to optimize memory reuse in programs with explicitly managed memory. Our techniques are inspired by hand-tuned seismic kernels on GPUs. The solutions we develop reduce the cost of transferring data across multiple memories with different bandwidth, latency and addressability properties. They result in reduction of communication volumes from main memory and faster execution speeds, comparable to hand-tuned implementations, for out-of-place stencils. We discuss various steps of our source-to-source compiler infrastructure and focus on specific optimizations which comprise: flexible generation of different granularities of communications with respect to computations, reduction of redundant transfers, reuse of data across processing elements using a globally addressable local memory and reuse of data within the same processing elements using a local private memory. The models of memory we consider in our techniques support the GPU model with device, shared and register memories. The techniques we derive are generally applicable and their formulation within our compiler can be extended to other types of architectures.