High-level Language Support for User-defined Reductions

Authors:
Steven J. Deitz;Bradford L. Chamberlain;Lawrence Snyder
Affiliations:
University of Washington, Seattle, WA 98195-2350 USA deitz@cs.washington.edu;University of Washington, Seattle, WA 98195-2350 USA brad@cs.washington.edu;University of Washington, Seattle, WA 98195-2350 USA snyder@cs.washington.edu
Venue:
The Journal of Supercomputing
Year:
2002

Citing 13
Cited 9

Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
A programmer's guide to ZPL

A programmer's guide to ZPL
Regions: an abstraction for expressing array computation

Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century
On defining application-specific high-level array operations by means of shape-invariant programming facilities

APL '98 Proceedings of the APL98 conference on Array processing language
A comparative study of the NAS MG benchmark across parallel languages and architectures

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference
Polaris: Improving the Effectiveness of Parallelizing Compilers

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Commutativity Analysis

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
ZPL's WYSIWYG Performance Model

HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
NESL: A Nested Data-Parallel Language

NESL: A Nested Data-Parallel Language
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

The design and implementation of a parallel array operator for the arbitrary remapping of data

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using semi-lagrangian formulations with automatic code generation for environmental modeling

Proceedings of the 2004 ACM symposium on Applied computing
Global-view abstractions for user-defined reductions and scans

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The design and development of ZPL

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Code generation for semi-lagrangian formulations

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Parallelization of DNA sequence alignment using OpenMP

Proceedings of the 2011 International Conference on Communication, Computing & Security
A proposal for user-defined reductions in OpenMP

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
An approach for semiautomatic locality optimizations using OpenMP

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The optimized handling of reductions on parallel supercomputers or clusters of workstations is critical to high performance because reductions are common in scientific codes and a potential source of bottlenecks. Yet in many high-level languages, a mechanism for writing efficient reductions remains surprisingly absent. Further, when such mechanisms do exist, they often do not provide the flexibility a programmer needs to achieve a desirable level of performance. In this paper, we present a new language construct for arbitrary reductions that lets a programmer achieve a level of performance equal to that achievable with the highly flexible, but low-level combination of Fortran and MPI. We have implemented this construct in the ZPL language and evaluate it in the context of the initialization of the NAS MG benchmark. We show a 45 times speedup over the same code written in ZPL without this construct. In addition, performance on a large number of processors surpasses that achieved in the NAS implementation showing that our mechanism provides programmers with the needed flexibility.