Re-execution of Distributed Programs to Detect Bugs Hidden by Racing

  • Authors:
  • Affiliations:
  • Venue:
  • HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Software Technology and Architecture - Volume 1
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding errors in non-deterministic programs is complicatedby the fact that an anomaly may occur during oneprogram execution, and not the next. Our objective is toprovide a practical yet powerful testing environment fordistributed systems, using re-execution. We focus on re-executing the program, under a strictly different messageordering. We show that messages are grouped into waves,such that any two messages from different waves must alwaysbe received in the same order. We provide an algorithmthat produces a re-execution that maximizes the numberof reordered pairs of message delivery events. We alsoprovide an efficient online algorithm for detecting racingmessages.