MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Authors:
Franck Cappello;Daniel Etiemble
Affiliations:
LRI, Université Paris-Sud, 91405, Orsay, France;LRI, Université Paris-Sud, 91405, Orsay, France
Venue:
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Year:
2000

Citing 10
Cited 69

SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance evaluation of the IBM SP and the Compaq AlphaServer SC

Proceedings of the 14th international conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
OpenMP for Networks of SMPs

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Home-Based SVM Protocols for SMP Clusters: Design and Performance

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Fine-Grain Software Distributed Shared Memory on SMP Clusters

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP

International Journal of High Performance Computing Applications

Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs

International Journal of Parallel Programming
Evaluating the XMT Parallel Programming Model

HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
High-Level Data Mapping for Clusters of SMPs

HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Implementing OpenMP Using Dataflow Execution Model for Data Locality and Efficient Parallel Execution

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Language and Compiler Support for Hybrid-Parallel Programming on SMP Clusters

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Communication Bandwidth of Parallel Programming Models on Hybrid Architectures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Parallel Iterative Solvers for Unstructured Grids Using an OpenMP/MPI Hybrid Programming Model for the GeoFEM Platform on SMP Cluster Architectures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Performance Oriented Programming for NUMA Architectures

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Dual-level parallelism for deterministic and stochastic CFD problems

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MPI and OpenMP paradigms on cluster of SMP architectures: the vacancy tracking algorithm for multi-dimensional array transposition

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Message passing and shared address space parallelism on an SMP cluster

Parallel Computing
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
ARMI: an adaptive, platform independent communication library

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
ParADE: An OpenMP Programming Environment for SMP Cluster Systems

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Parallel Iterative Solvers of GeoFEM with Selective Blocking Preconditioning for Nonlinear Contact Problems on the Earth Simulator

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scalability of hybrid programming for a CFD code on the earth simulator

Parallel Computing
Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
Three-level hybrid vs. flat MPI on the Earth Simulator: parallel iterative solvers for finite-element method

Applied Numerical Mathematics - 6th IMACS International symposium on iterative methods in scientific computing
The Effect of Process Topology and Load Balancing on Parallel Programming Models for SMP Clusters and Iterative Algorithms

The Journal of Supercomputing
Parallel Multiple Sequences Alignment in SMP Cluster

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Performance prediction through simulation of a hybrid MPI/OpenMP application

Parallel Computing - OpenMp
Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator

Parallel Computing - OpenMp
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Scalable algorithms for molecular dynamics simulations on commodity clusters

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Development of mixed mode MPI / OpenMP applications

Scientific Programming
Performance portability on EARTH: a case study across several parallel architectures

Cluster Computing
Performance evaluation of the Sun Fire Link SMP clusters

International Journal of High Performance Computing and Networking
Parallelization methods for implementation of a magnetic induction tomography forward model in symmetric multiprocessor systems

Parallel Computing
SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Overcoming performance bottlenecks in using OpenMP on SMP clusters

Parallel Computing
Optimization Strategies Using Hybrid MPI+OpenMP Parallelization for Large-Scale Data Visualization on Earth Simulator

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
IDEWEP: Web service for astronomical parallel image deconvolution

Journal of Network and Computer Applications
Exploiting Fine-Grained Parallelism in the Phylogenetic Likelihood Function with MPI, Pthreads, and OpenMP: A Performance Study

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Efficient hybrid parallelisation of tiled algorithms on SMP clusters

International Journal of Computational Science and Engineering
Hybrid Message-Passing and Shared-Memory Programming in a Molecular Dynamics Application On Multicore Clusters

International Journal of High Performance Computing Applications
Automatic Hybrid MPI+OpenMP Code Generation with llc

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
On the Need for a Consortium of Capability Centers

International Journal of High Performance Computing Applications
Three-level hybrid vs. flat MPI on the Earth Simulator: Parallel iterative solvers for finite-element method

Applied Numerical Mathematics - 6th IMACS International symposium on iterative methods in scientific computing
Performance enhancement of smith-waterman algorithm using hybrid model: comparing the MPI and hybrid programming paradigm on SMP clusters

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
The Importance of Non-Data-Communication Overheads in MPI

International Journal of High Performance Computing Applications
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
A characterization of shared data access patterns in UPC programs

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
An evaluation of MPI and OpenMP paradigms for multi-dimensional data remapping

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
An implementation of parallel eigenvalue computation using dual-level hybrid parallelism

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Early experiments with the OpenMP/MPI hybrid programming model

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Hybrid bulk synchronous parallelism library for clustered smp architectures

Proceedings of the fourth international workshop on High-level parallel programming and applications
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Load balancing for regular meshes on SMPs with MPI

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Hybrid programming model for implicit PDE simulations on multicore architectures

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
A framework for an automatic hybrid MPI+OpenMP code generation

Proceedings of the 19th High Performance Computing Symposia
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Solving the symmetric tridiagonal eigenproblem using MPI/OpenMP hybrid parallelization

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Applying high performance computing techniques in astrophysics

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
An OpenMP 3.1 validation testsuite

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Evaluating the suitability of the EGM2008 geopotential model for the Korean peninsula using parallel computing on a diskless cluster

Computers & Geosciences
Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing
Targeting distributed systems in fastflow

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Understanding parallelism in graph traversal on multi-core clusters

Computer Science - Research and Development
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

IBM Journal of Research and Development
Energy estimation for MPI broadcasting algorithms in large scale HPC systems

Proceedings of the 20th European MPI Users' Group Meeting
Performance metrics in a hybrid MPI-OpenMP based molecular dynamics simulation with short-range interactions

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.