Is data distribution necessary in OpenMP?
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Dynamic data migration for structured AMR solvers
International Journal of Parallel Programming
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
How to Scale Nested OpenMP Applications on the ScaleMP vSMP Architecture
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Binding nested OpenMP programs on hierarchical memory architectures
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Today most multi-socket shared memory systems exhibit a nonuniform memory architecture (NUMA). However, programming models such as OpenMP do not provide explicit support for that. To overcome this limitation, we propose a platform-independent approach to describe the system topology and to place threads on the hardware. A distance matrix provides system information and is used to allow for thread binding with user-defined strategies. We propose and implement means to query this information from within the program, so that expert users can take advantage of this knowledge, and demonstrate the usefulness of our approach with an application from the Fraunhofer Institute for Laser Technology in Aachen.