Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

Authors:
A. E. Eichenberger;K. O'Brien
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
IBM Journal of Research and Development
Year:
2013

Citing 22
Cited 1

Programming with POSIX threads

Programming with POSIX threads
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
OpenMP on networks of workstations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Automatic parallelization for symmetric shared-memory multiprocessors

CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures

HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
Development of mixed mode MPI / OpenMP applications

Scientific Programming
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

IEEE Transactions on Parallel and Distributed Systems
OpenUH: an optimizing, portable OpenMP compiler: Research Articles

Concurrency and Computation: Practice & Experience - Current Trends in Compilers for Parallel Computers (CPC2006)
Supporting OpenMP on cell

International Journal of Parallel Programming
OpenMP tasks in IBM XL compilers

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting global optimizations for openmp programs in the openuh compiler

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalability Evaluation of Barrier Algorithms for OpenMP

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Implementing OpenMP on a high performance embedded multicore MPSoC

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Barrier Optimization for OpenMP Program

SNPD '09 Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing
CLOMP: accurately characterizing OpenMP application overheads

International Journal of Parallel Programming
Analyzing overheads and scalability characteristics of openMP applications

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Evaluating OpenMP on chip multithreading platforms

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Structure and algorithm for implementing OpenMP workshares

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more

IBM Blue Gene/Q system software stack

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

As newer supercomputers continue to increase the number of threads, there is growing pressure on applications to exploit more of the available parallelism in their codes, including coarse-, medium-, and fine-grain parallelism. OpenMPi is one of the dominant shared-memory programming models and is well suited for exploiting medium- and fine-grain parallelism. OpenMP research has focused on application tuning, compiler optimizations, programming-model extensions, and porting to distributed-memory platforms; however, we have found that current algorithms used to implement basic OpenMP constructs have significant overheads and scale poorly. In this paper, we explore low-overhead, scalable algorithms for creating parallel regions and demonstrate reductions in overhead of up to a factor of 5 on an IBM Blue Gene®/Q node.