Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A practical algorithm for exact array dependence analysis
Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for automatic alignment of arrays
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Dynamic data distribution with control flow analysis
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallel Programming with Polaris
Computer
A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Solving Alignment Using Elementary Linear Algebra
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Access Descriptor based Locality Analysis for Distributed-Shared Memory Multiprocessors
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Interprocedural parallelization using memory classification analysis
Interprocedural parallelization using memory classification analysis
Optimizing code parallelization through a constraint network based approach
Proceedings of the 43rd annual Design Automation Conference
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a Locality-Communication Graph (LCG) and the formulation of the compiler technique as a Mixed Integer Nonlinear Programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions驴the decompositions驴that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines.