Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Analysis of interprocedural side effects in a parallel programming environment
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Detecting redundant accesses to array data
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
A practical algorithm for exact array dependence analysis
Communications of the ACM
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Analysis of synchronization in a parallel programming environment
Analysis of synchronization in a parallel programming environment
Interprocedural compilation of Fortran D for MIMD distributed-memory machines
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A practical data flow framework for array reference analysis and its use in optimizations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook
The high performance Fortran handbook
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Message passing on the Meiko CS-2
Parallel Computing - Special issue: message passing interfaces
Optimal code motion: theory and practice
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Separating data and control transfer in distributed operating systems
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide
The Omega Library interface guide
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Decoupling synchronization and data transfer in message passing systems of parallel computers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler reduction of synchronisation in shared virtual memory systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic data layout for distributed memory machines
Automatic data layout for distributed memory machines
Interprocedural data flow based optimizations for distributed memory compilation
Software—Practice & Experience
A linear algebra framework for static High Performance Fortran code distribution
Scientific Programming - Special issue: High Performance Fortran comes of age
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
IEEE Transactions on Parallel and Distributed Systems
Combining dependence and data-flow analyses to optimize communication
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Resource-Based Communication Placement Analysis
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
An Array Data Flow Analysis Based Communication Optimizer
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
A Unified Data-Flow Framework for Optimizing Communication
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Interprocedural Analysis for Parallelization
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
A Generalized Framework for Global Communication Optimization
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
(R) Synchronization Elimination in the Deposit Model
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Compiler directed architecture-dependent communication optimizations
Compiler directed architecture-dependent communication optimizations
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
Implementation and performance optimization of a parallel contour line generation algorithm
Computers & Geosciences
Hi-index | 0.00 |
Minimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers. This paper presents a general technique which uses a global data-flow framework to optimize communication and synchronization in the context of the one-way communication model. In contrast to the conventional send/receive message-passing communication model, one-way communication is a new paradigm that decouples message transmission and synchronization. In parallel machines with appropriate low-level support, this may open up new opportunities not only to further optimize communication, but also to reduce the synchronization overhead. We present optimization techniques using our framework for eliminating redundant data communication and synchronization operations. Our approach works with the most general data alignments and distributions in languages like High Performance Fortran (HPF) and uses a combination of the traditional data-flow analysis and polyhedral algebra. Empirical results for several scientific benchmarks on a Cray T3E multiprocessor machine demonstrate that our approach is successful in reducing the number of data (communication) and synchronization messages, thereby reducing the overall execution times.