An Efficient OpenMP Runtime System for Hierarchical Architectures

Authors:
Samuel Thibault;François Broquedis;Brice Goglin;Raymond Namyst;Pierre-André Wacrenier
Affiliations:
INRIA Futurs - LaBRI, Talence cedex, France 33405;INRIA Futurs - LaBRI, Talence cedex, France 33405;INRIA Futurs - LaBRI, Talence cedex, France 33405;INRIA Futurs - LaBRI, Talence cedex, France 33405;INRIA Futurs - LaBRI, Talence cedex, France 33405
Venue:
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Year:
2007

Citing 9
Cited 4

The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
OpenMP Extensions for Thread Groups and Their Run-Time Support

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Performance Evaluation of OpenMP Applications with Nested Parallelism

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Automatic thread distribution for nested parallelism in OpenMP

Proceedings of the 19th annual international conference on Supercomputing
Lightweight reference affinity analysis

Proceedings of the 19th annual international conference on Supercomputing
Hardware profile-guided automatic page placement for ccNUMA systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Load balancing and OpenMP implementation of nested parallelism

Parallel Computing - OpenMp
Extending the OpenMP standard for thread mapping and grouping

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Hierarchical multithreading: programming model and system software

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
The openip open source image processing library

Proceedings of the international conference on Multimedia
Binding nested OpenMP programs on hierarchical memory architectures

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
How OpenMP applications get more benefit from many-core era

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization steps. By using the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, we are able to easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles. We then propose a scheduling strategy suited to nested OpenMP parallelism. The resulting preliminary performance evaluations show an important improvement of the speedup on a typical NAS OpenMP benchmark application.