Out-of-Core Wavefront Computations with Reduced Synchronization

  • Authors:
  • Pierre-Nicolas Clauss;Jens Gustedt;Frederic Suter

  • Affiliations:
  • -;-;-

  • Venue:
  • PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed wanders in form of a "wave" through the matrix. Macro-pipelining techniques can achieve an efficient parallelization of such algorithms by overlapping communication and computation. Usually these techniques are limited to situations where all the data to be processed fits into main memory, whereas for larger data the I/O usage pattern for external storage requires special attention. A previous work presented a first extension of the wavefront framework to these so-called out-of-core problems. The present paper proposes a redesign of their algorithm that minimizes both overhead and perturbations coming from communications. To tackle the issue of non-contiguous I/O, we also propose an optimized data layout. These two major modifications of the original algorithm eventually allow us to present a third improvement as our implementation shortens the transition phase between two consecutive iterations of the wavefront algorithm. Experiments performed with the ParXXL library show that we can significantly reduce the time lost during inefficient I/O operations and thus obtain faster computations.