Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes

Authors:
Hwansoo Han;Chau-Wen Tseng
Affiliations:
-;-
Venue:
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Year:
1998

Citing 17
Cited 3

Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Preliminary experiences with the Fortran D compiler

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Efficient support for irregular applications on distributed-memory machines

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Runtime and language support for compiling adaptive irregular programs on distributed-memory machines

Software—Practice & Experience
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating the Performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Exploiting spatial regularity in irregular iterative applications

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Initial Results for Glacial Variable Analysis

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Compile-time Synchronization Optimizations for Software DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Update Protocols and Iterative Scientific Applications

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
On improving the performance of data partitioning oriented parallel irregular reductions

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buffers and the shared-memory interface supported by software DSMs (TreadMarks). We introduce LocalWrite, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates the need for buffers or synchronized writes, but may replicate computation. We investigate the impact of connectivity (node/edge ratio), locality (accesses to local data) and adaptivity (edge modifications) on their relative performance. LocalWrite improves performance by 50-150% compared to using replicated buffers, and can match or exceed gather/scatter for applications with low locality or high adaptivity.