An Efficient Technique for Tracking Nondeterministic Execution and its Applications

  • Authors:
  • Elmootazbellah N. Elnozahy

  • Affiliations:
  • -

  • Venue:
  • An Efficient Technique for Tracking Nondeterministic Execution and its Applications
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This report describes a technique for using instruction counters to track nondeterminism in the execution of operating system kernels and user programs. The operating system records the number of instructions between consecutive nondeterministic events and information about their nature during normal operation. During an analysis phase, the execution is repeated under the control of a monitor, and the nondeterministic events are applied at the same instructions as during the monitored execution. We describe the application of this technique to four areas: Performance monitoring: The technique can be used to instrument an operating system to capture long traces of memory references. Unlike current techniques, it performs the gathering in a postmortem phase and therefore has negligible effect on the computation itself during the monitoring phase. We expect trace periods that are longer than what existing techniques can capture by orders of magnitude with little or no noticeable perturbation to the monitored system itself. Kernel Debugging: This technique can be used to repeat the execution of an operating system that precedes a crash due to a Heizenbug. This allows developers a systematic approach for getting rid of these bugs during testing. Support for Rollback-Recovery: Systems that use checkpointing and execution replay can adopt this technique to ensure that execution replay during recovery is identical to the one before failure, despite the occurence of nondeterministic events that cannot be captured efficiently otherwise. Software-based TMR systems: Using this technique, a TMR system based on active replication can be built out of off-the-shelf workstations connected by a general purpose network. Nondeterministic events occuring in a primary can be emulated on backup machines to ensure identical execution. We plan to implement this technique on two architectures. The first is an HP platform based on the PA-RISC architecture which supports instruction counters in hardware. The second is a MIPS-based architecture and in which programs are processed to emulate an instruction counter in software.