Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

Authors:
François Broquedis;Nathalie Furmento;Brice Goglin;Raymond Namyst;Pierre-André Wacrenier
Affiliations:
University of Bordeaux,;CNRS LaBRI, Talence F-33405;INRIA,;University of Bordeaux,;University of Bordeaux,
Venue:
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Year:
2009

Citing 11
Cited 9

The high performance Fortran handbook

The high performance Fortran handbook
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system

Proceedings of the 19th annual international conference on Supercomputing
Data and thread affinity in openmp programs

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
An Efficient OpenMP Runtime System for Hierarchical Architectures

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Enabling high-performance memory migration for multithreaded applications on LINUX

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Scheduling dynamic OpenMP applications over multicore architectures

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Geographical locality and dynamic data migration for OpenMP implementations of adaptive PDE solvers

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Building portable thread schedulers for hierarchical multiprocessors: the bubblesched framework

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Exploiting thread-data affinity in OpenMP with data access patterns

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system

The Journal of Supercomputing
How OpenMP applications get more benefit from many-core era

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Node-based memory management for scalable NUMA architectures

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Large-scale time-harmonic electromagnetic field analysis using a multigrid solver on a distributed memory parallel computer

Parallel Computing
Optimizing the advanced accelerator simulation framework synergia using OpenMP

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
High throughput software for direct numerical simulations of compressible two-phase flows

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Application task and data placement in embedded many-core NUMA architectures

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Task scheduling on manycore processors with home caches

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid memory access penalties. Directive-based programming languages such as OpenMPprovide programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system. Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into "scheduling hints" to solve thread/memory affinity issues. It enables dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. First experiments show that mixed solutions (migrating threads and data) outperform next-touch -based data distribution policies and open possibilities for new optimizations.