Shortcut Replay: A Replay Technique for Debugging Long-Running Parallel Programs

  • Authors:
  • Nam Thoai;Dieter Kranzlmüller;Jens Volkert

  • Affiliations:
  • -;-;-

  • Venue:
  • ASIAN '02 Proceedings of the7th Asian Computing Science Conference on Advances in Computing Science: Internet Computing and Modeling, Grid Computing, Peer-to-Peer Computing, and Cluster
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications running on HPC Platforms, PC clusters, or computational grids are often long-running parallel programs. Debugging these programs is a challenge due to the lack of efficient debugging tools and the inherent possibility of nondeterminism in parallel programs. To overcome the problem of nondeterminism, several sophisticated record&replay mechanisms have been developed. However, the substantial problem of the waiting time during re-execution was not sufficiently investigated in the past. This paper shows that the waiting time is in some cases unlimited with currently available methods, which prohibits efficient interactive debugging tools. In contrast, the new shortcut replay method combines checkpointing and debugging techniques. It controls the replayed execution based on the trace data in order to minimize the waiting time during debugging long-running parallel programs.