Minimizing Data and Synchronization Costs in One-Way Communication

Authors:
Mahmut Kandemir;Alok Choudhary;Prithviraj Banerjee;J. Ramanujam;Nagaraj Shenoy
Affiliations:
Pennsylvania State Univ., University Park;Northwestern Univ., Evanston, IL;Northwestern Univ., Evanston, IL;Louisiana State Univ., Baton Rouge;Northwestern Univ., Evanston, IL
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2000

Citing 45
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Analysis of interprocedural side effects in a parallel programming environment

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Detecting redundant accesses to array data

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
A practical algorithm for exact array dependence analysis

Communications of the ACM
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Analysis of synchronization in a parallel programming environment

Analysis of synchronization in a parallel programming environment
Interprocedural compilation of Fortran D for MIMD distributed-memory machines

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A practical data flow framework for array reference analysis and its use in optimizations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook

The high performance Fortran handbook
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Message passing on the Meiko CS-2

Parallel Computing - Special issue: message passing interfaces
Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Separating data and control transfer in distributed operating systems

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
AP1000+: architectural support of PUT/GET interface for parallelizing compiler

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide

The Omega Library interface guide
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Decoupling synchronization and data transfer in message passing systems of parallel computers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler reduction of synchronisation in shared virtual memory systems

ICS '95 Proceedings of the 9th international conference on Supercomputing
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Static analysis to reduce synchronization costs in data-parallel programs

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic data layout for distributed memory machines

Automatic data layout for distributed memory machines
Interprocedural data flow based optimizations for distributed memory compilation

Software—Practice & Experience
A linear algebra framework for static High Performance Fortran code distribution

Scientific Programming - Special issue: High Performance Fortran comes of age
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Combining dependence and data-flow analyses to optimize communication

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Resource-Based Communication Placement Analysis

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
An Array Data Flow Analysis Based Communication Optimizer

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
A Unified Data-Flow Framework for Optimizing Communication

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Interprocedural Analysis for Parallelization

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
A Generalized Framework for Global Communication Optimization

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
(R) Synchronization Elimination in the Deposit Model

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Compiler directed architecture-dependent communication optimizations

Compiler directed architecture-dependent communication optimizations

Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles

IEEE Transactions on Parallel and Distributed Systems
HUNTing the Overlap

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Performance portable optimizations for loops containing communication operations

Proceedings of the 22nd annual international conference on Supercomputing
Implementation and performance optimization of a parallel contour line generation algorithm

Computers & Geosciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Minimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers. This paper presents a general technique which uses a global data-flow framework to optimize communication and synchronization in the context of the one-way communication model. In contrast to the conventional send/receive message-passing communication model, one-way communication is a new paradigm that decouples message transmission and synchronization. In parallel machines with appropriate low-level support, this may open up new opportunities not only to further optimize communication, but also to reduce the synchronization overhead. We present optimization techniques using our framework for eliminating redundant data communication and synchronization operations. Our approach works with the most general data alignments and distributions in languages like High Performance Fortran (HPF) and uses a combination of the traditional data-flow analysis and polyhedral algebra. Empirical results for several scientific benchmarks on a Cray T3E multiprocessor machine demonstrate that our approach is successful in reducing the number of data (communication) and synchronization messages, thereby reducing the overall execution times.