PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Automatic parallelization of irregular applications
Parallel Computing - special issue on parallel computing for irregular applications
Efficient compiler and run-time support for parallel irregular reductions
Parallel Computing - special issue on parallel computing for irregular applications
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions
IEEE Transactions on Parallel and Distributed Systems
A Comparison of Parallelization Techniques for Irregular Reductions
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
On the parallelization of irregular and dynamic programs
Parallel Computing
Decoupled software pipelining creates parallelization opportunities
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Hi-index | 0.00 |
Much effort has been devoted recently to efficiently parallelize irregular reductions. In this paper, parallelizing techniques for these computations are analyzed in terms of three performance aspects: parallelism, data locality and memory overhead. These aspects have a strong influence in the overall performance and scalability of the parallel code. We will discuss how the parallelization techniques usually try to optimize some of these aspects, while missing the other(s). We will show that by combining complementary techniques we can improve the overall performance/scalability of the parallel irregular reduction, obtaining an effective solution for large problems on large machines. Specifically, a combination of array expansion and a locality-oriented method (DWA-LIP), named partial array expansion, is introduced. An implementation of the proposed method is discussed, showing that the transformation that the compiler must apply to the irregular reduction code is not excessively complex. Finally, the method is analyzed and experimentally evaluated.