Preserving the original MPI semantics in a virtualized processor environment
Science of Computer Programming
Hi-index | 0.00 |
Improving MPI foundational software to suit multicore systems is a key issue for developing effective parallel software on high performance communication domain. Towards this issue, in this paper, we propose a novel technique, called MPI Accelerator or MPIActor in short, which is a transparent middleware to enhance conventional MPI libraries. The main idea is to optimize MPI routines for multicore systems by adopting threaded MPI mechanism and multicore architecture aware collectives in MPIActor. With the join of MPIActor, on one hand, all MPI processes in each node are mapped to several threads in one process. As a result, the overhead of intra-node point-to-point communications can greatly decrease. On the other hand, the collective routines are implemented by the cooperation of individual intra - and inter-node collective subroutines, and the intra-node collective subroutines can be further optimized by multicore architecture aware collective algorithms. Based on above idea, a framework involving an MPI_Reduce routine and a set of point-to-point communication routines has been implemented and evaluated on a 256 cores Nehalem platform. When compared to the performance of MVAPICH2, the final experimental results show that the performance by MPIActor can be significantly improved whatever by using OSU_LATENCY benchmark for point-to-point communications or IMB Reduce benchmark for reduction collectives. Especially, the performance results of using OSU_LATENCY benchmark even can be improved up to 321%.