Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications

Authors:
Laksono Adhianto;Barbara Chapman
Affiliations:
University of Houston, USA;University of Houston, USA
Venue:
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Year:
2006

Citing 19
Cited 3

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Prefix computations on symmetric multiprocessors

Journal of Parallel and Distributed Computing
Parallel programming with message passing and directives

Computing in Science and Engineering
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

SIAM Review
Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs

International Journal of Parallel Programming
Fast Measurement of LogP Parameters for Message Passing Platforms

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Eclipse - An Open Source Platform for the Next Generation of Development Tools

NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
SKaMPI: A Detailed, Accurate MPI Benchmark

Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A framework for performance modeling and prediction

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Locality phase prediction

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Using Dynamic Tracing Sampling to Measure Long Running Programs

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing

Performance enhancement of smith-waterman algorithm using hybrid model: comparing the MPI and hybrid programming paradigm on SMP clusters

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A framework for an automatic hybrid MPI+OpenMP code generation

Proceedings of the 19th High Performance Computing Symposia
Experiences Developing the OpenUH Compiler and Runtime Infrastructure

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance evaluation and modeling is a crucial process to enable the optimization of parallel programs. Programs written using two programming models, such as MPI and OpenMP, require an analysis to determine both performance efficiency and the most suitable numbers of processes and threads for their execution on a given platform. To study both of these problems, we propose the construction of a model that is based upon a small number of parameters, but is able to capture the complexity of the runtime system. We must incorporate measurements of overheads introduced by each of the programming models, and thus need to model both the network and computational aspects of the system. We have combined two different techniques: static analysis, driven by the OpenUH compiler, to retrieve application signatures and a parallelization overhead measurement benchmark, realized by Sphinx and Perfsuite, to collect system profiles. Finally, we propose a performance evaluation measurement to identify communication and computation efficiency. In this paper we describe our underlying framework, the performance model, and show how our tool can be applied to a sample code.