Improving parallel irregular reductions using partial array expansion

Authors:
Eladio Gutiérrez;Oscar Plata;Emilio L. Zapata
Affiliations:
University of Málaga Spain;University of Málaga Spain;University of Málaga Spain
Venue:
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Year:
2001

Citing 8
Cited 2

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Proceedings of the 14th international conference on Supercomputing
Automatic parallelization of irregular applications

Parallel Computing - special issue on parallel computing for irregular applications
Efficient compiler and run-time support for parallel irregular reductions

Parallel Computing - special issue on parallel computing for irregular applications
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions

IEEE Transactions on Parallel and Distributed Systems
A Comparison of Parallelization Techniques for Irregular Reductions

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing

On the parallelization of irregular and dynamic programs

Parallel Computing
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much effort has been devoted recently to efficiently parallelize irregular reductions. In this paper, parallelizing techniques for these computations are analyzed in terms of three performance aspects: parallelism, data locality and memory overhead. These aspects have a strong influence in the overall performance and scalability of the parallel code. We will discuss how the parallelization techniques usually try to optimize some of these aspects, while missing the other(s). We will show that by combining complementary techniques we can improve the overall performance/scalability of the parallel irregular reduction, obtaining an effective solution for large problems on large machines. Specifically, a combination of array expansion and a locality-oriented method (DWA-LIP), named partial array expansion, is introduced. An implementation of the proposed method is discussed, showing that the transformation that the compiler must apply to the irregular reduction code is not excessively complex. Finally, the method is analyzed and experimentally evaluated.