Optimizing a conjugate gradient solver with non-blocking collective operations

  • Authors:
  • Torsten Hoefler;Peter Gottschling;Andrew Lumsdaine;Wolfgang Rehm

  • Affiliations:
  • Indiana University, Open Systems Lab, Bloomington, IN 47404, USA and Technical University of Chemnitz, Department of Computer Science, 09107 Chemnitz, Germany;Indiana University, Open Systems Lab, Bloomington, IN 47404, USA;Indiana University, Open Systems Lab, Bloomington, IN 47404, USA;Technical University of Chemnitz, Department of Computer Science, 09107 Chemnitz, Germany

  • Venue:
  • Parallel Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a case study that analyzes the suitability and usage of non-blocking collective operations in parallel applications. As with their point-to-point counterparts, non-blocking collective operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. These operations are provided for MPI programs with LibNBC, a portable low-overhead implementation of non-blocking collective operations built on MPI-1. The straightforward applicability of the LibNBC is demonstrated by incorporating non-blocking collective operations into a parallel conjugate gradient solver. Although only minor changes are required to use them, non-blocking collective operations allow most of the communication costs to be hidden and provide performance improvements of up to 34%. We also show that, because of overlap, there is no significant performance difference between Gigabit Ethernet and InfiniBandTM for special cases of our calculation.