Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
A design methodology for synthesizing parallel algorithms and architectures
Journal of Parallel and Distributed Computing
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Data dependence and its application to parallel processing
International Journal of Parallel Programming
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semi-automatic process partitioning for parallel computation
International Journal of Parallel Programming
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Loop partitioning for distributed memory multiprocessors as unimodular transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Generating explicit communication from shared-memory program references
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Partitioning the global space for distributed memory systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Deriving Array Distributions by Optimization Techniques
The Journal of Supercomputing
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Fortran RED - A Retargetable Environment for Automatic Data Layout
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The availability of large distributed memory multiprocessors and parallel computers with complex memory hierarchies have left the programmer with the difficult task of planning the detailed parallel execution of a program; for example, in the case of distributed memory machines, the programmer is forced to manually distribute code and data in addition to managing communication among tasks explicitly. Current work in compiler support for this task has focused on automating task partitioning, assuming a fixed data partition or a programmer-specified (in the form of annotations) data partition. This paper argues that one of the reasons for inefficient parallel execution is the lack of synergism between task partitioning and data partitioning and allocation; hence data and task allocation should both be influenced by the inherent dependence structure of the computation (which is the source of synergism). We present a methodology based on a unified approach to task and data partitioning; we show how to derive data and task partitions for computations expressed as nested loops that exhibit regular dependencies and how to map these onto distributed memory multiprocessors. Based on the mapping, we show how to derive code for the nodes of a distributed memory machine with appropriate message transmission constructs. We also discuss related communication optimizations.