A hybrid execution model for fine-grained languages on distributed memory multicomputers
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Lazy threads: implementing a fast parallel call
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
StackThreads/MP: integrating futures into calling standards
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Multithreaded Programming with Win32
Multithreaded Programming with Win32
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Schematic: A Concurrent Object-Oriented Extension to Scheme
OBPDC '95 Selected papers from the Workshop, on Object-Based Parallel and Distributed Computation
ICC++-AC++ Dialect for High Performance Parallel Computing
ISOTAS '96 Proceedings of the Second JSSST International Symposium on Object Technologies for Advanced Software
A Message Passing Implementation of Lazy Task Creation
Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
An Efficient OpenMP Runtime System for Hierarchical Architectures
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Supporting nested OpenMP parallelism in the TAU performance system
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Runtime adjustment of parallel nested loops
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Nested parallelism in the OMPI OpenmP/C compiler
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Many existing OpenMP systems do not sufficiently implement nested parallelism. This is supposedly because nested parallelism is believed to require a significant implementation effort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and efficient implementation of OpenMP nested parallelism using StackThreads/MP, which is a fine-grain thread library. Thanks to StackThreads/MP, OpenMP parallel constructs are simply mapped onto thread creation primitives of StackThreads/MP, yet they are efficiently managed with a fixed number of threads in the underlying thread package (e.g., Pthreads). Experimental results on Sun Ultra Enterprise 10000 with up to 60 processors show that overhead imposed by nested parallelism is very small (1-3% in five out of six applications, and 8% for the other), and there is a significant scalability benefit for applications with nested parallelism.