Vector models for data-parallel computing
Vector models for data-parallel computing
PVM: a framework for parallel distributed computing
Concurrency: Practice and Experience
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
High-level Language Support for User-defined Reductions
The Journal of Supercomputing
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
NESL: A Nested Data-Parallel Language (Version 2.6)
NESL: A Nested Data-Parallel Language (Version 2.6)
High-level programming language abstractions for advanced and dynamic parallel computations
High-level programming language abstractions for advanced and dynamic parallel computations
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Resource-aware programming and simulation of MPSoC architectures through extension of X10
Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Parallelizing user-defined and implicit reductions globally on multiprocessors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Programming with BSP homomorphisms
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Since APL, reductions and scans have been recognized as powerful programming concepts. Abstracting an accumulation loop (reduction) and an update loop (scan), the concepts have efficient parallel implementations based on the parallel prefix algorithm. They are often included in high-level languages with a built-in set of operators such as sum, product, min, etc. MPI provides library routines for reductions that account for nearly nine percent of all MPI calls in the NAS Parallel Benchmarks (NPB) version 3.2. Some researchers have even advocated reductions and scans as the principal tool for parallel algorithm design.Also since APL, the idea of applying the reduction control structure to a user-defined operator has been proposed, and several implementations (some parallel) have been reported. This paper presents the first global-view formulation of user-defined scans and an improved global-view formulation of user-defined reductions, demonstrating them in the context of the Chapel programming language. Further, these formulations are extended to a message passing context (MPI), thus transferring global-view abstractions to local-view languages and perhaps signaling a way to enhance local-view languages incrementally. Finally, examples are presented showing global-view user-defined reductions "cleaning up" and/or "speeding up" portions of two NAS benchmarks, IS and MG. In consequence, these generalized reduction and scan abstractions make the full power of the parallel prefix technique available to both global- and local-view parallel programming.