SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Framework for instruction-level tracing and analysis of program executions
Proceedings of the 2nd international conference on Virtual execution environments
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Replay debugging for distributed applications
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
SherLog: error diagnosis by connecting clues from run-time logs
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Execution synthesis: a technique for automated software debugging
Proceedings of the 5th European conference on Computer systems
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Debug determinism: the sweet spot for replay-based debugging
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
OFRewind: enabling record and replay troubleshooting for networks
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
DEFINED: deterministic execution for interactive control-plane debugging
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. On these large scale, distributed, and data intensive programs, existing replay methods either incur excessive production recording overheads or are unable to provide high fidelity replay. In this position paper, we hypothesize and empirically verify that control plane determinism is the key to record-efficient and high-fidelity replay of datacenter applications. The key idea behind control plane determinism is that debugging does not always require a precise replica of the original application run. Instead, it often suffices to produce some run that exhibits the original behavior of the control-plane-the application code responsible for controlling and managing data flow through a datacenter system.