Scheduling dynamic OpenMP applications over multicore architectures

  • Authors:
  • François Broquedis;François Diakhaté;Samuel Thibault;Olivier Aumage;Raymond Namyst;Pierre-André Wacrenier

  • Affiliations:
  • INRIA Futurs, LaBRI, Université Bordeaux 1, France;INRIA Futurs, LaBRI, Université Bordeaux 1, France;INRIA Futurs, LaBRI, Université Bordeaux 1, France;INRIA Futurs, LaBRI, Université Bordeaux 1, France;INRIA Futurs, LaBRI, Université Bordeaux 1, France;INRIA Futurs, LaBRI, Université Bordeaux 1, France

  • Venue:
  • IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Approaching the theoretical performance of hierarchical multicoremachines requires a very careful distribution of threads and dataamong the underlying non-uniform architecture in order to minimizecache misses and NUMA penalties. While it is acknowledged thatOpenMP can enhance the quality of thread scheduling on such architecturesin a portable way, by transmitting precious information aboutthe affinities between threads and data to the underlying runtime system,most OpenMP runtime systems are actually unable to efficiently supporthighly irregular, massively parallel applications on NUMA machines. In this paper, we present a thread scheduling policy suited to theexecution of OpenMP programs featuring irregular and massive nestedparallelism over hierarchical architectures. Our policy enforces a distributionof threads that maximizes the proximity of threads belonging tothe same parallel region, and uses a NUMA-aware work stealing strategywhen load balancing is needed. It has been developed as a plug-in tothe forestGOMP OpenMP platform [TBG+07]. We demonstrate theefficiency of our approach with a highly irregular recursive OpenMP programresulting from the generic parallelization of a surface reconstructionapplication. We achieve a speedup of 14 on a 16-core machine with noapplication-level optimization.