Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

Authors:
Sylvain Girbal;Nicolas Vasilache;Cédric Bastoul;Albert Cohen;David Parello;Marc Sigler;Olivier Temam
Affiliations:
ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France;DALI Group, LP2A, University of Perpignan, France;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France
Venue:
International Journal of Parallel Programming
Year:
2006

Citing 49
Cited 44

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project

ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization

ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Mapping uniform loop nests onto distributed memory architectures

Parallel Computing
Automating non-unimodular loop transformations for massive parallelism

Parallel Computing
A singular loop transformation framework based on non-singular matrices

International Journal of Parallel Programming
Automatic parallelization of while-loops using speculative execution

International Journal of Parallel Programming
Fuzzy array dataflow analysis

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Constraint-based array dependence analysis

Constraint-based array dependence analysis
Fuzzy array dataflow analysis

Journal of Parallel and Distributed Computing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maximal static expansion

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic storage management for parallel programs

Parallel Computing - Special issues on languages and compilers for parallel computers
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Proceedings of the 14th international conference on Supercomputing
Maximal Static Expansion

International Journal of Parallel Programming
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Reasoning about Program Transformations

Reasoning about Program Transformations
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Scheduling and Automatic Parallelization

Scheduling and Automatic Parallelization
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
Automatic Intra-Register Vectorization for the Intel® Architecture

International Journal of Parallel Programming
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Automatic Parallelization of Fortran Programs in the Presence of Procedure Calls

ESOP '86 Proceedings of the European Symposium on Programming
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Generation of Synchronous Code for Automatic Parallelization of while Loops

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Improving Software Pipelining With Unroll-and-Jam

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Evaluating support for global address space languages on the Cray X1

Proceedings of the 18th annual international conference on Supercomputing
Adaptive java optimisation using instance-based learning

Proceedings of the 18th annual international conference on Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Value Evolution Graph and its Use in Memory Reference Analysis

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Improving data locality by chunking

CC'03 Proceedings of the 12th international conference on Compiler construction
Efficient code generation for automatic parallelization and optimization

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction

Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Incremental hierarchical memory size estimation for steering of loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Software Pipelining in Nested Loops with Prolog-Epilog Merging

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Playing the trade-off game: Architecture exploration using Coffeee

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Design of a real-time face detection parallel architecture using high-level synthesis

EURASIP Journal on Embedded Systems - Special issue on design and architectures for signal and image processing
SARA: StreAm register allocation

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
COFFEE: compiler framework for energy-aware exploration

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Processor virtualization and split compilation for heterogeneous multicore embedded systems

Proceedings of the 47th Design Automation Conference
A model for fusion and code motion in an automatic parallelizing compiler

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Constructing application-specific memory hierarchies on FPGAs

Transactions on high-performance embedded architectures and compilers III
Auto-tuning full applications: A case study

International Journal of High Performance Computing Applications
Performance analysis and tuning of automatically parallelized OpenMP applications

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications

Journal of Systems Architecture: the EUROMICRO Journal
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Unrolling loops containing task parallelism

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Extendable pattern-oriented optimization directives

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimizing I/O for big array analytics

Proceedings of the VLDB Endowment
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
Polyhedra scanning revisited

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Extendable pattern-oriented optimization directives

ACM Transactions on Architecture and Code Optimization (TACO)
Portable section-level tuning of compiler parallelized applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Tulipse: a visualization framework for user-guided parallelization

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Exact dependence analysis for increased communication overlap

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Memory reuse optimizations in the R-Stream compiler

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
When polyhedral transformations meet SIMD code generation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Loop acceleration exploration for ASIP architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Towards making autotuning mainstream

International Journal of High Performance Computing Applications
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is poorly achieved in general, resulting in weak scalability and disappointing sustained performance. We address this challenge by working on the program representation itself, using a semi-automatic optimization approach to demonstrate that current compilers offen suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework. Technically, the purpose of this paper is threefold: (1) to show that syntactic code representations close to the operational semantics lead to rigid phase ordering and cumbersome expression of architecture-aware loop transformations, (2) to illustrate how complex transformation sequences may be needed to achieve significant performance benefits, (3) to facilitate the automatic search for program transformation sequences, improving on classical polyhedral representations to better support operation research strategies in a simpler, structured search space. The proposed framework relies on a unified polyhedral representation of loops and statements, using normalization rules to allow flexible and expressive transformation sequencing. This representation allows to extend the scalability of polyhedral dependence analysis, and to delay the (automatic) legality checks until the end of a transformation sequence. Our work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler.