Robust non-intrusive record-replay with processor extraction

  • Authors:
  • Filippo Gioachin;Gengbin Zheng;Laxmikant V. Kalé

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Proceedings of the 8th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advent of increasingly larger parallel machines, debugging is becoming more and more challenging. In particular, applications at this scale tend to behave non-deterministically, leading to race condition bugs. Furthermore, gaining access to these large machines for long debugging sessions is generally infeasible. In this paper, we present a 3-step algorithm to perform what we call "processor extraction": a procedure to record the execution of a set of processors from a parallel application, and replay any of them in a controlled environment. Our technique generates very low interference in the recorded program thanks to the separation between non-determinism elimination, and detailed processor recording. In order to improve robustness and accuracy, we further augmented our algorithm with a self-correction mechanism.