The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Semantics and Implementation of a Generalized forall Statement for Parallel Languages
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
OpenMP Extensions for Thread Groups and Their Run-Time Support
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
The Performance Impact of Granularity Control and Functional Parallelism
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Performance Evaluation of OpenMP Applications with Nested Parallelism
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Is the schedule clause really necessary in OpenMP?
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Dynamic control of parallelism and granularity in executing nested parallel loops
SPDP '91 Proceedings of the 1991 Third IEEE Symposium on Parallel and Distributed Processing
Automatic thread distribution for nested parallelism in OpenMP
Proceedings of the 19th annual international conference on Supercomputing
Parallelism and scalability in an image processing application
International Journal of Parallel Programming
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Adaptive parallel approximate similarity search for responsive multimedia retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Nested parallelism in the OMPI OpenmP/C compiler
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.01 |
OpenMP allows programmers to specify nested parallelism in parallel applications. In the case of scientific applications, parallel loops are the most important source of parallelism. In this paper we present an automatic mechanism to dynamically detect the best way to exploit the parallelism when having nested parallel loops. This mechanism is based on the number of threads, the problem size, and the number of iterations on the loop. To do that, we claim that programmers must specify the potential application parallelism and give the runtime the responsibility to decide the best way to exploit it. We have implemented this mechanism inside the IBM XL runtime library. Evaluation shows that our mechanism dynamically adapts the parallelism generated to the application and runtime parameters, reaching the same speedup as the best static parallelization (with a priori information).