Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs

Authors:
Stephen L. Olivier;Jan F. Prins
Affiliations:
University of North Carolina at Chapel Hill, Chapel Hill, USA NC 27599;University of North Carolina at Chapel Hill, Chapel Hill, USA NC 27599
Venue:
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Year:
2009

Citing 10
Cited 2

Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
An adaptive cut-off for task parallelism

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
An Experimental Evaluation of the New OpenMP Tasking Model

Languages and Compilers for Parallel Computing
OpenMP tasks in IBM XL compilers

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
UTS: an unbalanced tree search benchmark

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Evaluation of OpenMP task scheduling strategies

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

Manycore work stealing

Proceedings of the 8th ACM International Conference on Computing Frontiers
Performance driven cooperation between kernel and auto-tuning multi-threaded interval b&b applications

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of an irregular and unpredictable search space, as arises e.g. in combinatorial optimization problems. As such UTS presents a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies. Scalability and overheads are compared for OpenMP 3.0, Cilk, and an OpenMP implementation of the benchmark without tasks that performs all scheduling, load balancing, and termination detection explicitly. Current OpenMP 3.0 implementations generally exhibit poor behavior on the UTS benchmark.