Effective automatic parallelization and locality optimization using the polyhedral model

Authors:
P. Sadayappan;Uday Kumar Reddy Bondhugula
Affiliations:
The Ohio State University;The Ohio State University
Venue:
Effective automatic parallelization and locality optimization using the polyhedral model
Year:
2008

Citing 0
Cited 5

Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Data locality and parallelism optimization using a constraint-based approach

Journal of Parallel and Distributed Computing
A data parallel view on polyhedral process networks

Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Extracting coarse-grained parallelism for affine perfectly nested quasi-uniform loops

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore processors have now become mainstream. The difficulty of programming these architectures to effectively tap the potential of multiple processing units is wellknown. Among several ways of addressing this issue, one of the very promising and simultaneously hard approaches is Automatic Parallelization. This approach does not require any effort on part of the programmer in the process of parallelizing a program. The Polyhedral model for compiler optimization is a powerful mathematical framework based on parametric linear algebra and integer linear programming. It provides an abstraction to represent nested loop computation and its data dependences using integer points in polyhedra. Complex execution-reordering, that can improve performance by parallelization as well as locality enhancement, is captured by affine transformations in the polyhedral model. With several recent advances, the polyhedral model has reached a level of maturity in various aspects—in particular, as a powerful intermediate representation for performing transformations, and code generation after transformations. However, an approach to automatically find good transformations for communication-optimized coarse-grained parallelization together with locality optimization has been a key missing link. This dissertation presents a new automatic transformation framework that solves the above problem. Our approach works by finding good affine transformations through a powerful and practical linear cost function that enables efficient tiling and fusion of sequences of arbitrarily nested loops. This in turn allows simultaneous optimization for coarse-grained parallelism and locality. Synchronization-free parallelism and pipelined parallelism at various levels can be extracted. The framework can be targeted to different parallel architectures, like general-purpose multicores, the Cell processor, GPUs, or embedded multiprocessor SoCs. The proposed framework has been implemented into a new end-to-end transformation tool, PLUTO, that can automatically generate parallel code from regular C program sections. Experimental results from the implemented system show significant performance improvement for single core and multicore execution over state-of-the-art research compiler frameworks as well as the best native production compilers. For several dense linear algebra kernels, code generated from Pluto beats, by a significant margin, the same kernels implemented with sequences of calls to highly-tuned libraries supplied by vendors. The system also allows empirical optimization to be performed in a much wider context than has been attempted previously. In addition, Pluto can serve as the parallel code generation backend for several high-level domain-specific languages.