Theory of linear and integer programming
Theory of linear and integer programming
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
A singular loop transformation framework based on non-singular matrices
International Journal of Parallel Programming
A unifying framework for iteration reordering transformations
A unifying framework for iteration reordering transformations
Communication-minimal tiling of uniform dependence loops
Journal of Parallel and Distributed Computing
Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs
International Journal of Parallel Programming
Loop parallelization algorithms: from parallelism extraction to code generation
Parallel Computing - Special issues on languages and compilers for parallel computers
Maximizing parallelism and minimizing synchronization with affine partitions
Parallel Computing - Special issues on languages and compilers for parallel computers
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Proceedings of the 20th annual international conference on Supercomputing
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
A compiler framework for optimization of affine loop nests for gpgpus
Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A model for fusion and code motion in an automatic parallelizing compiler
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallel memory prediction for fused linear algebra kernels
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Fault oblivious eXascale whitepaper
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Automatic privatization for parallel execution of loops
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Improving last level cache locality by integrating loop and data transformations
Proceedings of the International Conference on Computer-Aided Design
Improving high level synthesis optimization opportunity through polyhedral transformations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language
CC'13 Proceedings of the 22nd international conference on Compiler Construction
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Revisiting loop fusion in the polyhedral framework
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that can improve performance by parallelization as well as locality enhancement. Although a significant body of research has addressed affine scheduling and partitioning, the problem of automaticallyfinding good affine transforms forcommunication-optimized coarsegrained parallelization together with locality optimization for the general case of arbitrarily-nested loop sequences remains a challenging problem. We propose an automatic transformation framework to optimize arbitrarilynested loop sequences with affine dependences for parallelism and locality simultaneously. The approach finds good tiling hyperplanes by embedding a powerful and versatile cost function into an Integer Linear Programming formulation. These tiling hyperplanes are used for communication-minimized coarse-grained parallelization as well as for locality optimization. The approach enables the minimization of inter-tile communication volume in the processor space, and minimization of reuse distances for local execution at each node. Programs requiring one-dimensional versusmulti-dimensional time schedules (with scheduling-based approaches) are all handled with the same algorithm. Synchronization-free parallelism, permutable loops or pipelined parallelismat various levels can be detected. Preliminary studies of the framework show promising results.