Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

  • Authors:
  • Uday Bondhugula;Muthu Baskaran;Sriram Krishnamoorthy;J. Ramanujam;Atanas Rountev;P. Sadayappan

  • Affiliations:
  • Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Electrical and Computer Engg., Louisiana State University, Baton Rouge, LA;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH

  • Venue:
  • CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that can improve performance by parallelization as well as locality enhancement. Although a significant body of research has addressed affine scheduling and partitioning, the problem of automaticallyfinding good affine transforms forcommunication-optimized coarsegrained parallelization together with locality optimization for the general case of arbitrarily-nested loop sequences remains a challenging problem. We propose an automatic transformation framework to optimize arbitrarilynested loop sequences with affine dependences for parallelism and locality simultaneously. The approach finds good tiling hyperplanes by embedding a powerful and versatile cost function into an Integer Linear Programming formulation. These tiling hyperplanes are used for communication-minimized coarse-grained parallelization as well as for locality optimization. The approach enables the minimization of inter-tile communication volume in the processor space, and minimization of reuse distances for local execution at each node. Programs requiring one-dimensional versusmulti-dimensional time schedules (with scheduling-based approaches) are all handled with the same algorithm. Synchronization-free parallelism, permutable loops or pipelined parallelismat various levels can be detected. Preliminary studies of the framework show promising results.