Speeding up OpenMP tasking

Authors:
Spiros N. Agathos;Nikolaos D. Kallimanis;Vassilios V. Dimakopoulos
Affiliations:
Department of Computer Science, University of Ioannina, Ioannina, Greece;Department of Computer Science, University of Ioannina, Ioannina, Greece;Department of Computer Science, University of Ioannina, Ioannina, Greece
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Year:
2012

Citing 14
Cited 1

Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
A comparison of task pools for dynamic load balancing of irregular algorithms: Research Articles

Concurrency and Computation: Practice & Experience
Dynamic circular work-stealing deque

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Support for OpenMP tasks in Nanos v4

CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
An Experimental Evaluation of the New OpenMP Tasking Model

Languages and Compilers for Parallel Computing
Intel threading building blocks

Intel threading building blocks
OpenMP tasks in IBM XL compilers

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Evaluation of OpenMP task scheduling strategies

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
A highly-efficient wait-free universal construction

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Design and Implementation of OpenMP Tasks in the OMPi Compiler

PCI '11 Proceedings of the 2011 15th Panhellenic Conference on Informatics
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we present a highly efficient implementation of OpenMP tasks. It is based on a runtime infrastructure architected for data locality, a crucial prerequisite for exploiting the NUMA nature of modern multicore multiprocessors. In addition, we employ fast work-stealing structures, based on a novel, efficient and fair blocking algorithm. Synthetic benchmarks show up to a 6-fold increase in throughput (tasks completed per second), while for a task-based OpenMP application suite we measured up to 87% reduction in execution times, as compared to other OpenMP implementations.