Unrolling loops containing task parallelism

Authors:
Roger Ferrer;Alejandro Duran;Xavier Martorell;Eduard Ayguadé
Affiliations:
Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;,Barcelona Supercomputing Center, Barcelona, Spain;,Barcelona Supercomputing Center, Barcelona, Spain
Venue:
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Year:
2009

Citing 8
Cited 3

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Uniform techniques for loop optimization

ICS '91 Proceedings of the 5th international conference on Supercomputing
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
On the Granularity and Clustering of Directed Acyclic Task Graphs

IEEE Transactions on Parallel and Distributed Systems
Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Support for OpenMP tasks in Nanos v4

CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
A language for the compact representation of multiple program versions

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Reducing task creation and termination overhead in explicitly parallel programs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Optimization strategies in different CUDA architectures using llCoMP

Microprocessors & Microsystems
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop contains parallelism inside most compilers will ignore it or perform a naïve transformation. We propose to extend the semantics of the loop unrolling transformation to cover loops that contain task parallelism. In these cases, the transformation will try to aggregate the multiple tasks that appear after a classic unrolling phase to reduce the overheads per iteration. We present an implementation of such extended loop unrolling for OpenMP tasks with two phases: a classical unroll followed by a task aggregation phase. Our aggregation technique covers the special cases where task parallelism appears inside branches or where the loop is uncountable. Our experimental results show that using this extended unroll allows loops with fine-grained tasks to reduce the overheads associated with task creation and obtain a much better scaling.