Assessing OpenMP tasking implementations on NUMA architectures

  • Authors:
  • Christian Terboven;Dirk Schmidl;Tim Cramer;Dieter an Mey

  • Affiliations:
  • Center for Computing and Communication, JARA, RWTH Aachen University, Germany;Center for Computing and Communication, JARA, RWTH Aachen University, Germany;Center for Computing and Communication, JARA, RWTH Aachen University, Germany;Center for Computing and Communication, JARA, RWTH Aachen University, Germany

  • Venue:
  • IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The introduction of task-level parallelization promises to raise the level of abstraction compared to thread-centric expression of parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be maintained. In contrast to traditional OpenMP worksharing constructs for which threads can be bound, the behavior of tasks is much less predetermined by the OpenMP specification and implementations have a high degree of freedom implementing task scheduling. Employing different approaches to express task-parallelism, namely the single-producer and parallel-producer patterns with different data initialization strategies, we compare the behavior and quality of OpenMP implementations with task-parallel codes on NUMA architectures. For the programmer, we propose recipies to express parallelism with tasks allowing to preserve data locality while optimizing the degree of parallelism. Our proposals are evaluated on reasonably large NUMA systems with both important application kernels as well as a real-world simulation code.