A practical approach to DOACROSS parallelization

Authors:
Priya Unnikrishnan;Jun Shirako;Kit Barton;Sanjay Chatterjee;Raul Silvera;Vivek Sarkar
Affiliations:
IBM Toronto Laboratory, Canada;Department of Computer Science, Rice University;IBM Toronto Laboratory, Canada;Department of Computer Science, Rice University;IBM Toronto Laboratory, Canada;Department of Computer Science, Rice University
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Year:
2012

Citing 12
Cited 0

Compiler algorithms for synchronization

IEEE Transactions on Computers
On data synchronization for multiprocessors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Compiler algorithms for event variable synchronization

ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler optimizations for parallel loops with fine-grained synchronization

Compiler optimizations for parallel loops with fine-grained synchronization
Compiler techniques for data synchronization in nested parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Compilation techniques for parallel systems

Parallel Computing - Special Anniversary issue
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Multiprocessor Synchronization for Concurrent Loops

IEEE Software
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences

IEEE Transactions on Parallel and Distributed Systems
Optimally Synchronizing DOACROSS Loops on Shared Memory Multiprocessors

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
On the interaction of tiling and automatic parallelization

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Experiments with auto-parallelizing SPEC2000FP benchmarks

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loops with cross-iteration dependences (doacross loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on doall loops, and consider doacross parallelism to be impractical due to the space inefficiencies and the synchronization overheads of past approaches. This paper presents a novel and practical approach to automatically parallelizing doacross loops for execution on manycore-SMP systems. We introduce a compiler-and-runtime optimization called dependence folding that bounds the number of synchronization variables allocated per worker thread (processor core) to be at most the maximum depth of a loop nest being considered for automatic parallelization. Our approach has been implemented in a development version of the IBM XL Fortran V13.1 commercial parallelizing compiler and runtime system. For four benchmarks where automatic doall parallelization was largely ineffective (speedups of under 2×), our implementation delivered speedups of 6.5×, 9.0×, 17.3×, and 17.5× on a 32-core IBM Power7 SMP system, thereby showing that doacross parallelization can be a valuable technique to complement doall parallelization.