Theory of linear and integer programming
Theory of linear and integer programming
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Machine Learning Approach to Automatic Production of Compiler Heuristics
AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
On the Optimality of Feautrier's Scheduling Algorithm
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Finding effective optimization phase sequences
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Code generation for multiple mappings
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Finding effective compilation sequences
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Space–time mapping and tiling: a helpful combination: Research Articles
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Fast and efficient searches for effective optimization-phase sequences
ACM Transactions on Architecture and Code Optimization (TACO)
A Heuristic Search Algorithm Based on Unified Transformation Framework
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Lattice-Based Memory Allocation
IEEE Transactions on Computers
Using Machine Learning to Focus Iterative Optimization
Proceedings of the International Symposium on Code Generation and Optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
VISTA: VPO interactive system for tuning applications
ACM Transactions on Embedded Computing Systems (TECS)
Systematic search within an optimisation space based on Unified Transformation Framework
International Journal of Computational Science and Engineering
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Hybrid optimizations: which optimization algorithm to use?
CC'06 Proceedings of the 15th international conference on Compiler Construction
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A compiler framework for optimization of affine loop nests for gpgpus
Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
A GPGPU compiler for memory optimization and parallelism management
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speeding up Nek5000 with autotuning and specialization
Proceedings of the 24th ACM International Conference on Supercomputing
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Auto-tuning full applications: A case study
International Journal of High Performance Computing Applications
Adaptive runtime selection of parallel schedules in the polytope model
Proceedings of the 19th High Performance Computing Symposia
Optimizing SDRAM bandwidth for custom FPGA loop accelerators
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Loop transformation recipes for code generation and auto-tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A unified optimizing compiler framework for different GPGPU architectures
ACM Transactions on Architecture and Code Optimization (TACO)
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
A multi-objective auto-tuning framework for parallel codes
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Algorithmic species: A classification of affine loop nests for parallel programming
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Fine-grained multi-phase array designs
Journal of Parallel and Distributed Computing
Revisiting loop fusion in the polyhedral framework
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compilers. Because compiler heuristics rely on simplistic performance models, and because they are bound to a limited set of transformations sequences, they only uncover a fraction of the peak performance on typical benchmarks. Iterative optimization is a maturing framework to address these limitations, but so far, it was not successfully applied complex loop transformation sequences because of the combinatorics of the optimization search space. We focus on the class of loop transformation which can be expressed as one-dimensional affine schedules. We define a systematic exploration method to enumerate the space of all legal, distinct transformations in this class. This method is based on an upstream characterization, as opposed to state-of-the-art downstream filtering approaches. Our results demonstrate orders of magnitude improvements in the size of the search space and in the convergence speed of a dedicated iterative optimization heuristic.