MPIActor - A Multicore-Architecture Adaptive and Thread-Based MPI Program Accelerator

  • Authors:
  • Zhiqiang Liu;Kaijun Ren;Junqiang Song

  • Affiliations:
  • -;-;-

  • Venue:
  • HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Improving MPI foundational software to suit multicore systems is a key issue for developing effective parallel software on high performance communication domain. Towards this issue, in this paper, we propose a novel technique, called MPI Accelerator or MPIActor in short, which is a transparent middleware to enhance conventional MPI libraries. The main idea is to optimize MPI routines for multicore systems by adopting threaded MPI mechanism and multicore architecture aware collectives in MPIActor. With the join of MPIActor, on one hand, all MPI processes in each node are mapped to several threads in one process. As a result, the overhead of intra-node point-to-point communications can greatly decrease. On the other hand, the collective routines are implemented by the cooperation of individual intra - and inter-node collective subroutines, and the intra-node collective subroutines can be further optimized by multicore architecture aware collective algorithms. Based on above idea, a framework involving an MPI_Reduce routine and a set of point-to-point communication routines has been implemented and evaluated on a 256 cores Nehalem platform. When compared to the performance of MVAPICH2, the final experimental results show that the performance by MPIActor can be significantly improved whatever by using OSU_LATENCY benchmark for point-to-point communications or IMB Reduce benchmark for reduction collectives. Especially, the performance results of using OSU_LATENCY benchmark even can be improved up to 321%.