Computer
Computer
An overview for the PTRAN analysis system for multiprocessing
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An efficient method of computing static single assignment form
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A parallelizing compiler for distributed memory parallel computers
A parallelizing compiler for distributed memory parallel computers
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Updating distributed variables in local computations
Concurrency: Practice and Experience
Compiling programs for a linear systolic array
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Supporting shared data structures on distributed memory architectures
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling programs for nonshared memory machines
Compiling programs for nonshared memory machines
Pandore: a system to manage data distribution
ICS '90 Proceedings of the 4th international conference on Supercomputing
Data-Parallel Programming on Multicomputers
IEEE Software
Data-Parallel Programming on MIMD Computers
IEEE Transactions on Parallel and Distributed Systems
Programming SIMPLE for Parallel Portability
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Compiler Parallelization of SIMPLE for a Distributed Memory Machine
Compiler Parallelization of SIMPLE for a Distributed Memory Machine
A systolic array optimizing compiler
A systolic array optimizing compiler
Compiling for locality of reference
Compiling for locality of reference
Compile time techniques for parallel execution of loops on distributed memory multiprocessors
Compile time techniques for parallel execution of loops on distributed memory multiprocessors
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Index array flattening through program transformation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
Compiler optimizations for scalable parallel systems
Runtime and compiler support for irregular computations
Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
IEEE Transactions on Parallel and Distributed Systems
Mobile Agents - The Right Vehicle for Distributed Sequential Computing
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Effective communication coalescing for data-parallel applications
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using data replication to reduce communication energy on chip multiprocessors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Transparent runtime parallelization of the R scripting language
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The lack of high-level languages and good compilers for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balancing. We have developed a parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference. A process decomposition is obtained by specializing the program for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution. Each process's role in the computation is determined by examining the data required for execution at run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message transmission. Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speedup within 60% to 70% of handwritten code.