Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Programming with POSIX threads
Programming with POSIX threads
Parallel programming in OpenMP
Parallel programming in OpenMP
Busy-wait barrier synchronization using distributed counters with local sensor
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Implementation of OpenMP Work-Sharing on the Cell Broadband Engine Architecture
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Experiments with auto-parallelizing SPEC2000FP benchmarks
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q
IBM Journal of Research and Development
Hi-index | 0.00 |
Although OpenMP has become the leading standard in parallel programming languages, the implementation of its runtime environment is not well discussed in the literature. In this paper, we introduce some of the key data structures required to implement OpenMP workshares in our runtime library and also discuss considerations on how to improve its performance. This includes items such as how to set up a workshare control block queue, how to initialize the data within a control block, how to improve barrier performance and how to handle implicit barrier and nowait situations. Finally, we discuss the performance of this implementation focusing on the EPCC benchmark.