Focus replay debugging effort on the control plane

  • Authors:
  • Gautam Altekar;Ion Stoica

  • Affiliations:
  • UC Berkeley;UC Berkeley

  • Venue:
  • HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. On these large scale, distributed, and data intensive programs, existing replay methods either incur excessive production recording overheads or are unable to provide high fidelity replay. In this position paper, we hypothesize and empirically verify that control plane determinism is the key to record-efficient and high-fidelity replay of datacenter applications. The key idea behind control plane determinism is that debugging does not always require a precise replica of the original application run. Instead, it often suffices to produce some run that exhibits the original behavior of the control-plane-the application code responsible for controlling and managing data flow through a datacenter system.