Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

Authors:
Uday Bondhugula;Muthu Baskaran;Sriram Krishnamoorthy;J. Ramanujam;Atanas Rountev;P. Sadayappan
Affiliations:
Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Electrical and Computer Engg., Louisiana State University, Baton Rouge, LA;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH
Venue:
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Year:
2008

Citing 21
Cited 26

Theory of linear and integer programming

Theory of linear and integer programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
A singular loop transformation framework based on non-singular matrices

International Journal of Parallel Programming
A unifying framework for iteration reordering transformations

A unifying framework for iteration reordering transformations
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs

International Journal of Parallel Programming
Loop parallelization algorithms: from parallelism extraction to code generation

Parallel Computing - Special issues on languages and compilers for parallel computers
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

International Journal of Parallel Programming
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction

A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A model for fusion and code motion in an automatic parallelizing compiler

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallel memory prediction for fused linear algebra kernels

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Fault oblivious eXascale whitepaper

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Optimizing data locality using array tiling

Proceedings of the International Conference on Computer-Aided Design
A data layout optimization framework for NUCA-based multicores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Automatic privatization for parallel execution of loops

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Improving last level cache locality by integrating loop and data transformations

Proceedings of the International Conference on Computer-Aided Design
Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language

CC'13 Proceedings of the 22nd international conference on Compiler Construction
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Reshaping cache misses to improve row-buffer locality in multicore systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Self adaptive run time scheduling for the automatic parallelization of loops with the C2µTC/SL compiler

Parallel Computing
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that can improve performance by parallelization as well as locality enhancement. Although a significant body of research has addressed affine scheduling and partitioning, the problem of automaticallyfinding good affine transforms forcommunication-optimized coarsegrained parallelization together with locality optimization for the general case of arbitrarily-nested loop sequences remains a challenging problem. We propose an automatic transformation framework to optimize arbitrarilynested loop sequences with affine dependences for parallelism and locality simultaneously. The approach finds good tiling hyperplanes by embedding a powerful and versatile cost function into an Integer Linear Programming formulation. These tiling hyperplanes are used for communication-minimized coarse-grained parallelization as well as for locality optimization. The approach enables the minimization of inter-tile communication volume in the processor space, and minimization of reuse distances for local execution at each node. Programs requiring one-dimensional versusmulti-dimensional time schedules (with scheduling-based approaches) are all handled with the same algorithm. Synchronization-free parallelism, permutable loops or pipelined parallelismat various levels can be detected. Preliminary studies of the framework show promising results.