Effective communication and computation overlap with hybrid MPI/SMPSs

Authors:
Vladimir Marjanovic;Jesús Labarta;Eduard Ayguadé;Mateo Valero
Affiliations:
Barcelona Supercomputing Center (BSC-CNS), Technical University of Catalunya (UPC), Barcelona, Spain;Barcelona Supercomputing Center (BSC-CNS), Technical University of Catalunya (UPC), Barcelona, Spain;Barcelona Supercomputing Center (BSC-CNS), Technical University of Catalunya (UPC), Barcelona, Spain;Barcelona Supercomputing Center (BSC-CNS), Technical University of Catalunya (UPC), Barcelona, Spain
Venue:
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Year:
2010

Citing 0
Cited 5

Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A stream-computing extension to OpenMP

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Productive cluster programming with OmpSs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A high-productivity task-based programming model for clusters

Concurrency and Computation: Practice & Experience
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication primitives. This increases code complexity, requiring more development effort and making less readable programs. This paper presents the hybrid use of MPI and SMPSs (SMP superscalar, a task-based shared-memory programming model) that allows the programmer to easily introduce the asynchrony necessary to overlap communication and computation. We demonstrate the hybrid use of MPI/SMPSs with the high-performance LINPACK benchmark (HPL), and compare it to the pure MPI implementation, which uses the look-ahead technique to overlap communication and computation. The hybrid MPI/SMPSs version significantly improves the performance of the pure MPI version, getting close to the asymptotic performance at medium problem sizes and still getting significant benefits at small/large problem sizes.