Communication optimizations for parallel computing using data access information

Authors:
Martin C. Rinard
Affiliations:
Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 21
Cited 3

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Sparse matrix test problems

ACM Transactions on Mathematical Software (TOMS)
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Orca: A Language for Parallel Programming of Distributed Systems

IEEE Transactions on Software Engineering
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Evaluating the basic performance of the Intel iPSC/860 parallel computer

Concurrency: Practice and Experience
The design and analysis of DASH: a scalable directory-based multiprocessor

The design and analysis of DASH: a scalable directory-based multiprocessor
Semantic foundations of Jade

POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Heterogeneous parallel programming in Jade

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Computation migration: enhancing locality for distributed-memory parallel systems

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data locality and load balancing in COOL

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization

Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language

The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Jade: A High-Level, Machine-Independent Language for Parallel Programming

Computer
Tempest: a substrate for portable parallel programs

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing

Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing

Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Run-time compilation for parallel sparse matrix computations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of parallel computations. This paper describes our experience automatically applying communication optimizations in the context of Jade, a portable, implicitly parallel programming language designed for exploiting task-level concurrency. Jade programmers start with a program written in a standard serial, imperative language, then use Jade constructs to declare how parts of the program access data. The Jade implementation uses this data access information to automatically extract the concurrency and apply communication optimizations. Jade implementations exist for both shared memory and message passing machines; each Jade implementation applies communication optimizations appropriate for the machine on which it runs. We present performance results for several Jade applications running on both a shared memory machine (the Stanford DASH machine) and a message passing machine (the Intel iPSC/860). We use these results to characterize the overall performance impact of the communication optimizations. For our application set replicating data for concurrent read access and improving the locality of the computation by placing tasks close to the data that they access are the most important optimizations. Broadcasting widely accessed data has a significant performance impact on one application; other optimizations such as concurrently fetching remote data and overlapping computation with communication have no effect.