CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A stream-computing extension to OpenMP
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A high-productivity task-based programming model for clusters
Concurrency and Computation: Practice & Experience
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication primitives. This increases code complexity, requiring more development effort and making less readable programs. This paper presents the hybrid use of MPI and SMPSs (SMP superscalar, a task-based shared-memory programming model) that allows the programmer to easily introduce the asynchrony necessary to overlap communication and computation. We demonstrate the hybrid use of MPI/SMPSs with the high-performance LINPACK benchmark (HPL), and compare it to the pure MPI implementation, which uses the look-ahead technique to overlap communication and computation. The hybrid MPI/SMPSs version significantly improves the performance of the pure MPI version, getting close to the asymptotic performance at medium problem sizes and still getting significant benefits at small/large problem sizes.