Race recording for multithreaded deterministic replay using multiprocessor hardware

  • Authors:
  • Mark D. Hill;Rastislav Bodik;Min Xu

  • Affiliations:
  • The University of Wisconsin - Madison;The University of Wisconsin - Madison;The University of Wisconsin - Madison

  • Venue:
  • Race recording for multithreaded deterministic replay using multiprocessor hardware
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance, intrusion analysis and more. Memory race recording is a key technology for multithreaded deterministic replay. This dissertation proposes a new race recording algorithm and describes a novel implementation of a race recorder based on multiprocessor cache coherence mechanisms. As a result of the new algorithm and the novel implementation, the new race recorder is significantly more efficient and less expensive than existing memory race recorders. Notably, the recorder simultaneously achieves several desired features: (1) Long recording by reducing the recorder log size to around one byte per thousand instructions. (2) Always-on recording by reducing the runtime overhead to less than 2%. (3) Inexpensive recording by reducing the timestamp memory size (which is different from the log size) to approximately 24 kilobytes per processor. (4) Broad applicability by supporting programs with data races and by supporting multiprocessor systems with both the Sequential Consistency and the Total Store Order (TSO) memory consistency models.Our improvements stem from several ideas: (1) a method of creating artificial dependencies that allows reduction and compression in the log, yet still allows parallel replay; (2) a method of approximating timestamps that allows significant reduction in the chip area cost; (3) a method of hardware coherence piggybacking that enables race recording with extremely low run-time overhead, yet still supports race recording with programs with data races; (4) a method of order-value-hybrid recording that supports race recording on multiprocessor systems with the TSO memory consistency model.We evaluate the recorder with full-system simulation of a Chip MultiProcessing (CMP) system and commercial workloads. Our results support that the recorder can be always-on and the log size is around one byte per kilo instructions (55 to 180 KB per (2 gigahertz) processor per second).