Theory of linear and integer programming
Theory of linear and integer programming
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis
Communications of the ACM
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
A singular loop transformation framework based on non-singular matrices
International Journal of Parallel Programming
Integration, the VLSI Journal
A unifying framework for iteration reordering transformations
A unifying framework for iteration reordering transformations
Communication-minimal tiling of uniform dependence loops
Journal of Parallel and Distributed Computing
Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs
International Journal of Parallel Programming
Loop parallelization algorithms: from parallelism extraction to code generation
Parallel Computing - Special issues on languages and compilers for parallel computers
Maximizing parallelism and minimizing synchronization with affine partitions
Parallel Computing - Special issues on languages and compilers for parallel computers
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Loop tiling for parallelism
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape
IEEE Transactions on Parallel and Distributed Systems
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Code generation for multiple mappings
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Code Generation in the Polytope Model
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
Transforming Complex Loop Nests for Locality
The Journal of Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness
Proceedings of the 20th annual international conference on Supercomputing
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
IEEE Transactions on Parallel and Distributed Systems
A compiler framework for optimization of affine loop nests for gpgpus
Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Improving parallelism and locality with asynchronous algorithms
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Parameterized tiling revisited
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing
Processor virtualization and split compilation for heterogeneous multicore embedded systems
Proceedings of the 47th Design Automation Conference
A model for fusion and code motion in an automatic parallelizing compiler
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multithreaded Geant4: semi-automatic transformation into scalable thread-parallel software
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Polyhedral Model Based Data Locality Optimization for Embedded Applications
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Automatic generation of fpga-specific pipelined accelerators
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Fault oblivious eXascale whitepaper
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
Adaptive runtime selection of parallel schedules in the polytope model
Proceedings of the 19th High Performance Computing Symposia
Polyhedral parallelization of binary code
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Adapting the polyhedral model as a framework for efficient speculative parallelization
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient tiled loop generation: D-tiling
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
The polyhedral model is more widely applicable than you think
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimizing I/O for big array analytics
Proceedings of the VLDB Endowment
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Automatic privatization for parallel execution of loops
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Defensive loop tiling for multi-core processor
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
A compiler framework for extracting superword level parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Scan detection and parallelization in "inherently sequential" nested loop programs
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hierarchical overlapped tiling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
VMAD: an advanced dynamic program analysis and instrumentation framework
CC'12 Proceedings of the 21st international conference on Compiler Construction
Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors
Proceedings of the 26th ACM international conference on Supercomputing
Distributed Shared Memory and Compiler-Induced Scalable Locality for Scalable Cluster Performance
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Using free scheduling for programming graphic cards
Facing the Multicore-Challenge II
Automatic OpenMP loop scheduling: a combined compiler and runtime approach
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Extendable pattern-oriented optimization directives
ACM Transactions on Architecture and Code Optimization (TACO)
Layout-oblivious optimization for matrix computations
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A data dependence test based on the projection of paths over shape graphs
Journal of Parallel and Distributed Computing
Patus for convenient high-performance stencils: evaluation in earthquake simulations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
FPGA-specific synthesis of loop-nests with pipelined computational cores
Microprocessors & Microsystems
Layout-oblivious compiler optimization for matrix computations
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improved loop tiling based on the removal of spurious false dependences
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
From serial loops to parallel execution on distributed systems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving high level synthesis optimization opportunity through polyhedral transformations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Data layout optimization for GPGPU architectures
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic speculative parallelization of loops using polyhedral dependence analysis
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Split tiling for GPUs: automatic parallelization using trapezoidal tiles
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
A stencil compiler for short-vector SIMD architectures
Proceedings of the 27th international ACM conference on International conference on supercomputing
Polyhedral model based mapping optimization of loop nests for CGRAs
Proceedings of the 50th Annual Design Automation Conference
Fine-grained multi-phase array designs
Journal of Parallel and Distributed Computing
Semantics-preserving data layout transformations for improved vectorisation
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Generating efficient data movement code for heterogeneous architectures with distributed-memory
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Improving polyhedral code generation for high-level synthesis
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Recovering memory access patterns of executable programs
Science of Computer Programming
A Case Study of Implementing Supernode Transformations
International Journal of Parallel Programming
Hi-index | 0.00 |
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.