Retrospect: deterministic replay of MPI applications for interactive distributed debugging

  • Authors:
  • Aurelien Bouteiller;George Bosilca;Jack Dongarra

  • Affiliations:
  • University of Tennessee, Knoxville;University of Tennessee, Knoxville;University of Tennessee, Knoxville

  • Venue:
  • PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

While high performance computing was eagerly adopted by users as a vehicle for satisfying a growing demand on computational power, some areas are still poorly explored. The MPI paradigm is considered as being the keystone for the large development of the HPC infrastructure over the last decade. However, even today the users have to face the lack of tools able to help increase the stability of the software stack and/or of the applications. In this paper we present and evaluate a tool designed to allow developers to further investigate the execution of parallel applications by enabling them to dynamically move back and forth in the execution timeline of a parallel application. Based on an unobtrusive message logging mechanism, deterministic replay is enforced, leading to a simpler and more efficient way to debug parallel software.