A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Authors:
E. Gutiérrez;O. Plata;E. L. Zapata
Affiliations:
Department of Computer Architecture, University of Málaga, E-29080 Málaga, Spain;Department of Computer Architecture, University of Málaga, E-29080 Málaga, Spain;Department of Computer Architecture, University of Málaga, E-29080 Málaga, Spain
Venue:
Proceedings of the 14th international conference on Supercomputing
Year:
2000

Citing 7
Cited 15

Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Idiom recognition in the Polaris parallelizing compiler

ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions

IEEE Transactions on Parallel and Distributed Systems
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

Improving parallel irregular reductions using partial array expansion

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Comparison of Parallelization Techniques for Irregular Reductions

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Optimization techniques for parallel irregular reductions

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Parallel techniques in irregular codes: cloth simulation as case of study

Journal of Parallel and Distributed Computing
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Exploiting Locality for Irregular Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions

International Journal of Computational Science and Engineering
An analytical model of locality-based parallel irregular reductions

Parallel Computing
Balanced, locality-based parallel irregular reductions

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A compiler framework to detect parallelism in irregular codes

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
On improving the performance of data partitioning oriented parallel irregular reductions

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing
Compiler and runtime support for shared memory parallelization of data mining algorithms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further assigned to the cooperating threads of computation. Iterations belonging to the same set are chosen in such a way that update different entries in the reduction array. That is, the loop distribution implies a conflict-free write distribution of the reduction array. The iteration sets are set up by building a loop-index prefetching data structure that allows to reorder properly the loop iterations. The proposed method is general, scalable, and easy to implement on a compiler. In addition it deals in a uniform way with one and multiple subscript arrays. In case of multiple indirection arrays, writes on the reduction array affecting different sets are solved by defining conflict-free supersets. A performance evaluation is presented. From the experimental results and performance analysis, the proposed method appears as a clear alternative to the array expansion and privatized buffer techniques, used on state-of-the-art parallelizing compilers, like Polaris or SUIF. The scalability problem that those techniques exhibit is missing in our method, as the memory overhead presented does not depend on the number of processors.