Programming with POSIX threads
Programming with POSIX threads
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
OpenMP on networks of workstations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Automatic parallelization for symmetric shared-memory multiprocessors
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures
HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
Development of mixed mode MPI / OpenMP applications
Scientific Programming
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
OpenUH: an optimizing, portable OpenMP compiler: Research Articles
Concurrency and Computation: Practice & Experience - Current Trends in Compilers for Parallel Computers (CPC2006)
International Journal of Parallel Programming
OpenMP tasks in IBM XL compilers
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting global optimizations for openmp programs in the openuh compiler
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalability Evaluation of Barrier Algorithms for OpenMP
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Implementing OpenMP on a high performance embedded multicore MPSoC
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Barrier Optimization for OpenMP Program
SNPD '09 Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing
CLOMP: accurately characterizing OpenMP application overheads
International Journal of Parallel Programming
Analyzing overheads and scalability characteristics of openMP applications
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Evaluating OpenMP on chip multithreading platforms
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Structure and algorithm for implementing OpenMP workshares
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
IBM Blue Gene/Q system software stack
IBM Journal of Research and Development
Hi-index | 0.00 |
As newer supercomputers continue to increase the number of threads, there is growing pressure on applications to exploit more of the available parallelism in their codes, including coarse-, medium-, and fine-grain parallelism. OpenMPi is one of the dominant shared-memory programming models and is well suited for exploiting medium- and fine-grain parallelism. OpenMP research has focused on application tuning, compiler optimizations, programming-model extensions, and porting to distributed-memory platforms; however, we have found that current algorithms used to implement basic OpenMP constructs have significant overheads and scale poorly. In this paper, we explore low-overhead, scalable algorithms for creating parallel regions and demonstrate reductions in overhead of up to a factor of 5 on an IBM Blue Gene®/Q node.