Performance Evaluation of OpenMP Applications with Nested Parallelism
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A compiler for exploiting nested parallelism in OpenMP programs
Parallel Computing - OpenMp
Parallelization of a hierarchical data clustering algorithm using OpenMP
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Runtime adjustment of parallel nested loops
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Exploiting fine-grain thread parallelism on multicore architectures
Scientific Programming - Software Development for Multi-core Computing Systems
A microbenchmark study of OpenMP overheads under nested parallelism
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
HOMPI: a hybrid programming framework for expressing and deploying task-based parallelism
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Task-Based execution of nested OpenMP loops
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper presents a new version of the OMPi OpenMP C compiler, enhanced by lightweight runtime support based on user-level multithreading. A large number of threads can be spawned for a parallel region and multiple levels of parallelism are supported efficiently, without introducing additional overheads to the OpenMP library. Management of nested parallelism is based on an adaptive distribution scheme with hierarchical work stealing that not only favors computation and data locality but also maps directly to recent architectural developments in shared memory multiprocessors. A comparative performance evaluation of several OpenMP implementations demonstrates the efficiency of our approach.