Compile-time partitioning and scheduling of parallel programs
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A practical algorithm for exact array dependence analysis
Communications of the ACM
Improving the performance of runtime parallelization
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs
International Journal of Parallel Programming
Loop parallelization algorithms: from parallelism extraction to code generation
Parallel Computing - Special issues on languages and compilers for parallel computers
Maximizing parallelism and minimizing synchronization with affine partitions
Parallel Computing - Special issues on languages and compilers for parallel computers
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
A comparison of list schedules for parallel processing systems
Communications of the ACM
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
An efficient algorithm for the run-time parallelization of DOACROSS loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Toward efficient and robust software speculative parallelization on multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving parallelism and data locality with affine partitioning
Improving parallelism and data locality with affine partitioning
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Proceedings of the 20th annual international conference on Supercomputing
Sensitivity analysis for automatic parallelization on multi-cores
Proceedings of the 21st annual international conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Multithreaded Geant4: semi-automatic transformation into scalable thread-parallel software
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
A hybrid approach of OpenMP for clusters
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatically tuning parallel and parallelized programs
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Automatic OpenMP loop scheduling: a combined compiler and runtime approach
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
From serial loops to parallel execution on distributed systems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proof-Directed Parallelization Synthesis by Separation Logic
ACM Transactions on Programming Languages and Systems (TOPLAS)
Semi-automatic restructuring of offloadable tasks for many-core accelerators
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi-core processors. However, for multi-statement input programs with statements of different dimensionalities, such as Cholesky or LU decomposition, the parallel tiled code generated by existing automatic parallelization approaches may suffer from significant load imbalance, resulting in poor scalability on multi-core systems. In this paper, we develop a completely automatic parallelization approach for transforming input affine sequential codes into efficient parallel codes that can be executed on a multi-core system in a load-balanced manner. In our approach, we employ a compile-time technique that enables dynamic extraction of inter-tile dependences at run-time, and dynamic scheduling of the parallel tiles on the processor cores for improved scalable execution. Our approach obviates the need for programmer intervention and re-writing of existing algorithms for efficient parallel execution on multi-cores. We demonstrate the usefulness of our approach through comparisons using linear algebra computations: LU and Cholesky decomposition.