Nested OpenMP for efficient computation of 3D critical points in multi-block CFD datasets
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Support for OpenMP tasks in Nanos v4
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Features for image retrieval: an experimental comparison
Information Retrieval
An Experimental Evaluation of the New OpenMP Tasking Model
Languages and Compilers for Parallel Computing
IEEE Transactions on Parallel and Distributed Systems
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Scheduling task parallelism on multi-socket multicore systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
A runtime implementation of OpenMP tasks
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Assessing the performance of OpenMP programs on the intel xeon phi
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
The introduction of task-level parallelization promises to raise the level of abstraction compared to thread-centric expression of parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be maintained. In contrast to traditional OpenMP worksharing constructs for which threads can be bound, the behavior of tasks is much less predetermined by the OpenMP specification and implementations have a high degree of freedom implementing task scheduling. Employing different approaches to express task-parallelism, namely the single-producer and parallel-producer patterns with different data initialization strategies, we compare the behavior and quality of OpenMP implementations with task-parallel codes on NUMA architectures. For the programmer, we propose recipies to express parallelism with tasks allowing to preserve data locality while optimizing the degree of parallelism. Our proposals are evaluated on reasonably large NUMA systems with both important application kernels as well as a real-world simulation code.