Generation of Efficient Nested Loops from Polyhedra

Authors:
Fabien Quilleré;Sanjay Rajopadhye;Doran Wilde
Affiliations:
Irisa, Campus de Beaulieu, F-35042 Rennes Cedex, Rennes, France;Irisa, Campus de Beaulieu, F-35042 Rennes Cedex, Rennes, France;Brigham Young University, Provo, Utah
Venue:
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Year:
2000

Citing 21
Cited 59

Theory of linear and integer programming

Theory of linear and integer programming
Semantical interprocedural parallelization: an overview of the PIPS project

ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Non-unimodular transformations of nested loops

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A framework for unifying reordering transformations

A framework for unifying reordering transformations
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Automating non-unimodular loop transformations for massive parallelism

Parallel Computing
A singular loop transformation framework based on non-singular matrices

International Journal of Parallel Programming
Beyond unimodular transformations

The Journal of Supercomputing
Deriving imperative code from functional programs

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Parallelizing compiler techniques based on linear inequalities

Parallelizing compiler techniques based on linear inequalities
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Structure of Computers and Computations

Structure of Computers and Computations
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Bouclettes: A Fortran Loop Parallelizer

HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
The Pandore data-parallel compiler and its portable runtime

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Code Generation in Automatic Parallelizers

Proceedings of the IFIP WG10.3 Working Conference on Applications in Parallel and Distributed Computing
Parametric Analysis of Polyhedral Iteration Spaces

ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Code Generation in the Polytope Model

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

Optimizing memory usage in the polyhedral model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Precise Data Locality Optimization of Nested Loops

The Journal of Supercomputing
A Technique for Parallel Loop Execution

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Data Sequence Locality: A Generalization of Temporal Locality

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Loop-Carried Code Placement

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Tiling and Memory Reuse for Sequences of Nested Loops

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Application-domain-driven system design for pervasive video processing

Ambient intelligence
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Simplifying reductions

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
The Z-polyhedral model

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient control generation for mapping nested loop programs onto processor arrays

Journal of Systems Architecture: the EUROMICRO Journal
Parameterized tiled loops for free

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
A step towards unifying schedule and storage optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
Multi-level tiling: M for the price of one

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Finding free schedules for parameterized loops with affine dependences represented with a single dependence relation

AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A domain specific interconnect for reconfigurable computing

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Finding Synchronization-Free Parallelism Represented with Trees of Dependent Operations

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications

Journal of Signal Processing Systems
Trade-offs in loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Extracting synchronization-free slices of operations in perfectly-nested loops

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Finding synchronization-free parallelism for non-uniform loops

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Finding coarse grained parallelism in computational geometry algorithms

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
Improving data locality by chunking

CC'03 Proceedings of the 12th international conference on Compiler construction
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Automatic code generation for distributed memory architectures in the polytope model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Efficient code generation for automatic parallelization and optimization

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations

Parallel Computing
ompVerify: polyhedral analysis for the OpenMP programmer

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Controller synthesis for mapping partitioned programs on array architectures

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Deciding where to call performance libraries

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Synchronization-Free automatic parallelization: beyond affine iteration-space slicing

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Efficient tiled loop generation: D-tiling

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Parameterized loop tiling

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction
Polyhedra scanning revisited

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Free scheduling for statement instances of parameterized arbitrarily nested affine loops

Parallel Computing
From serial loops to parallel execution on distributed systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Memory reuse optimizations in the R-Stream compiler

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Non-affine Extensions to Polyhedral Code Generation

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target space-time domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a trade-off between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.