Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Authors:
Louis-Noel Pouchet;Cedric Bastoul;Albert Cohen;Nicolas Vasilache
Affiliations:
ALCHEMY Group, INRIA FUTURS and Paris-Sud University;ALCHEMY Group, INRIA FUTURS and Paris-Sud University;ALCHEMY Group, INRIA FUTURS and Paris-Sud University;ALCHEMY Group, INRIA FUTURS and Paris-Sud University
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2007

Citing 32
Cited 25

Theory of linear and integer programming

Theory of linear and integer programming
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Scheduling and Automatic Parallelization

Scheduling and Automatic Parallelization
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Machine Learning Approach to Automatic Production of Compiler Heuristics

AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
On the Optimality of Feautrier's Scheduling Algorithm

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Finding effective optimization phase sequences

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Finding effective compilation sequences

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Space–time mapping and tiling: a helpful combination: Research Articles

Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Fast and efficient searches for effective optimization-phase sequences

ACM Transactions on Architecture and Code Optimization (TACO)
A Heuristic Search Algorithm Based on Unified Transformation Framework

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Lattice-Based Memory Allocation

IEEE Transactions on Computers
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
VISTA: VPO interactive system for tuning applications

ACM Transactions on Embedded Computing Systems (TECS)
Systematic search within an optimisation space based on Unified Transformation Framework

International Journal of Computational Science and Engineering
Improving data locality by chunking

CC'03 Proceedings of the 12th international conference on Compiler construction
A practical method for quickly evaluating program optimizations

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Hybrid optimizations: which optimization algorithm to use?

CC'06 Proceedings of the 15th international conference on Compiler Construction
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Slicing based code parallelization for minimizing inter-processor communication

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
A GPGPU compiler for memory optimization and parallelism management

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speeding up Nek5000 with autotuning and specialization

Proceedings of the 24th ACM International Conference on Supercomputing
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Auto-tuning full applications: A case study

International Journal of High Performance Computing Applications
Adaptive runtime selection of parallel schedules in the polytope model

Proceedings of the 19th High Performance Computing Symposia
Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A unified optimizing compiler framework for different GPGPU architectures

ACM Transactions on Architecture and Code Optimization (TACO)
POET: a scripting language for applying parameterized source-to-source program transformations

Software—Practice & Experience
A multi-objective auto-tuning framework for parallel codes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Algorithmic species: A classification of affine loop nests for parallel programming

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Fine-grained multi-phase array designs

Journal of Parallel and Distributed Computing
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compilers. Because compiler heuristics rely on simplistic performance models, and because they are bound to a limited set of transformations sequences, they only uncover a fraction of the peak performance on typical benchmarks. Iterative optimization is a maturing framework to address these limitations, but so far, it was not successfully applied complex loop transformation sequences because of the combinatorics of the optimization search space. We focus on the class of loop transformation which can be expressed as one-dimensional affine schedules. We define a systematic exploration method to enumerate the space of all legal, distinct transformations in this class. This method is based on an upstream characterization, as opposed to state-of-the-art downstream filtering approaches. Our results demonstrate orders of magnitude improvements in the size of the search space and in the convergence speed of a dedicated iterative optimization heuristic.