Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Loop skewing: the wavefront method revisited
International Journal of Parallel Programming
Vector optimization vs vectorization
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
The design of nectar: a network backplane for heterogeneous multicomputers
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
A dynamic scheduling method for irregular parallel programs
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines
ICS '92 Proceedings of the 6th international conference on Supercomputing
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
Automatic generation of parallel programs with dynamic load balancing for a network of workstations
Automatic generation of parallel programs with dynamic load balancing for a network of workstations
Program Improvement by Source-to-Source Transformation
Journal of the ACM (JACM)
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
An Overview of the Fortran D Programming System
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
An important challenge in the area of distributed computing is to automate the selection of the parameters that control the distributed computation. A performance-critical parameter is the grain size of the computation, i.e., the interval between successive synchronization points in the application. This parameter is hard to select since it depends both on compile time (loop structure and data dependences, computational complexity) and run time components (speed of compute nodes and network). On networks of workstations that are shared with other users, the run-time parameters can change over time. As a result, it is also necessary to consider the interactions with dynamic load balancing, which is needed to achieve good performance in this environment. In this paper we present a method for automatically selecting the grain size of the computation consisting of nested DO loops. The method is based on close cooperation between the compiler and the runtime system. We evaluate the method using both simulation and measurements for an implementation on the Nectar multicomputer.