Iterative optimization in the polyhedral model: part ii, multidimensional time

Authors:
Louis-Noël Pouchet;Cédric Bastoul;Albert Cohen;John Cavazos
Affiliations:
ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France;ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France;ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France;Dept. of Computer & Information Sciences, University of Delaware, Newark, DE, USA
Venue:
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Year:
2008

Citing 36
Cited 25

Theory of linear and integer programming

Theory of linear and integer programming
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Transformations of nested loops with non-convex iteration spaces

Parallel Computing
Fuzzy array dataflow analysis

Journal of Parallel and Distributed Computing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimizing for reduced code space using genetic algorithms

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Scheduling and Automatic Parallelization

Scheduling and Automatic Parallelization
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
GAPS: A Compiler Framework for Genetic Algorithm (GA) Optimised Parallelisation

HPCN Europe 1998 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
On the Optimality of Feautrier's Scheduling Algorithm

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Optimization within a unified transformation framework

Optimization within a unified transformation framework
Adaptive java optimisation using instance-based learning

Proceedings of the 18th annual international conference on Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
ACME: adaptive compilation made efficient

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fast and efficient searches for effective optimization-phase sequences

ACM Transactions on Architecture and Code Optimization (TACO)
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

International Journal of High Performance Computing Applications
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Parameterized tiled loops for free

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Automatic Correction of Loop Transformations

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Systematic search within an optimisation space based on Unified Transformation Framework

International Journal of Computational Science and Engineering
Improving data locality by chunking

CC'03 Proceedings of the 12th international conference on Compiler construction
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction

A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Speeding up Nek5000 with autotuning and specialization

Proceedings of the 24th ACM International Conference on Supercomputing
Partitioning streaming parallelism for multi-cores: a machine learning based approach

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Symbolic and analytic techniques for resource analysis of java bytecode

TGC'10 Proceedings of the 5th international conference on Trustworthly global computing
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Auto-tuning full applications: A case study

International Journal of High Performance Computing Applications
Adaptive runtime selection of parallel schedules in the polytope model

Proceedings of the 19th High Performance Computing Symposia
A transactional memory with automatic performance tuning

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Programmable data dependencies and placements

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A multi-objective auto-tuning framework for parallel codes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
When polyhedral transformations meet SIMD code generation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Fine-grained multi-phase array designs

Journal of Parallel and Distributed Computing
Using machine learning to partition streaming programs

ACM Transactions on Architecture and Code Optimization (TACO)
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
High level transforms for SIMD and low-level computer vision algorithms

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture as well as the effects of complex code restructuring. However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedback-driven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.