Horus: a flexible group communication system
Communications of the ACM
ACM Transactions on Computer Systems (TOCS)
Deterministic replay of Java multithreaded applications
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
Deterministic Replay of Distributed Java Applications
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Jockey: a user-space library for record-replay debugging
Proceedings of the sixth international symposium on Automated analysis-driven debugging
SHIM: a deterministic model for heterogeneous embedded systems
Proceedings of the 5th ACM international conference on Embedded software
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The case for determinism in database systems
Proceedings of the VLDB Endowment
Deterministic process groups in dOS
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
DoublePlay: parallelizing sequential logging and replay
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RCDC: a relaxed consistency deterministic computer
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Calvin: Deterministic or not? Free will to choose
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Fast crash recovery in RAMCloud
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Parrot: a practical runtime for deterministic, stable, and reliable threads
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DEFINED: deterministic execution for interactive control-plane debugging
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Finding trojan message vulnerabilities in distributed systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Nondeterminism complicates the development and management of distributed systems, and arises from two main sources: the local behavior of each individual node as well as the behavior of the network connecting them. Taming nondeterminism effectively requires dealing with both sources. This paper proposes DDOS, a system that leverages prior work on deterministic multithreading to offer: 1) space-efficient record/replay of distributed systems; and 2) fully deterministic distributed behavior. Leveraging deterministic behavior at each node makes outgoing messages strictly a function of explicit inputs. This allows us to record the system by logging just message's arrival time, not the contents. Going further, we propose and implement an algorithm that makes all communication between nodes deterministic by scheduling communication onto a global logical timeline. We implement both algorithms in a system called DDOS and evaluate our system with parallel scientific applications, an HTTP/memcached system and a distributed microbenchmark with a high volume of peer-to-peer communication. Our results show up to two orders of magnitude reduction in log size of record/replay, and that distributed systems can be made deterministic with an order of magnitude of overhead.