Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
The process group approach to reliable distributed computing
Communications of the ACM
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Horus: a flexible group communication system
Communications of the ACM
Deterministic replay of Java multithreaded applications
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
httperf—a tool for measuring web server performance
ACM SIGMETRICS Performance Evaluation Review
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Enforcing Determinism for the Consistent Replication of Multithreaded CORBA Applications
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
The design and implementation of Zap: a system for migrating computing environments
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Virtualizing Transactional Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Jockey: a user-space library for record-replay debugging
Proceedings of the sixth international symposium on Automated analysis-driven debugging
SHIM: a deterministic model for heterogeneous embedded systems
Proceedings of the 5th ACM international conference on Embedded software
Active Replication of Multithreaded Applications
IEEE Transactions on Parallel and Distributed Systems
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Relaxed determinism: making redundant execution on multiprocessors practical
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Isolating web programs in modern browser architectures
Proceedings of the 4th ACM European conference on Computer systems
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A type and effect system for deterministic parallel Java
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
SherLog: error diagnosis by connecting clues from run-time logs
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Execution synthesis: a technique for automated software debugging
Proceedings of the 5th European conference on Computer systems
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
RCDC: a relaxed consistency deterministic computer
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Non-deterministic parallelism considered useful
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Automated atomicity-violation fixing
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Towards practical avoidance of information leakage in enterprise networks
HotSec'11 Proceedings of the 6th USENIX conference on Hot topics in security
Dthreads: efficient deterministic multithreading
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Efficient deterministic multithreading through schedule relaxation
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Pervasive detection of process races in deployed systems
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Toward a formal semantic framework for deterministic parallel programming
DISC'11 Proceedings of the 25th international conference on Distributed computing
DoublePlay: Parallelizing Sequential Logging and Replay
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
Internally deterministic parallel algorithms can be fast
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Aikido: accelerating shared data dynamic analyses
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Efficient system-enforced deterministic parallelism
Communications of the ACM
Improving server applications with system transactions
Proceedings of the 7th ACM european conference on Computer Systems
The datacenter needs an operating system
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Exploiting parallelism in deterministic shared memory multiprocessing
Journal of Parallel and Distributed Computing
Sound and precise analysis of parallel programs through schedule specialization
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Understanding the interleaving-space overlap across inputs and software versions
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Execution privatization for scheduler-oblivious concurrent programs
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Automated concurrency-bug fixing
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
All about Eve: execute-verify replication for multi-core servers
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Deterministic Replay Using Global Clock
ACM Transactions on Architecture and Code Optimization (TACO)
GPUDet: a deterministic GPU architecture
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
DDOS: taming nondeterminism in distributed systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Proceedings of the 8th ACM European Conference on Computer Systems
Automated debugging for arbitrarily long executions
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Lazy tree mapping: generalizing and scaling deterministic parallelism
Proceedings of the 4th Asia-Pacific Workshop on Systems
Input-covering schedules for multithreaded programs
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Parrot: a practical runtime for deterministic, stable, and reliable threads
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DEFINED: deterministic execution for interactive control-plane debugging
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Deterministic galois: on-demand, portable and parameterless
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Efficient deterministic multithreading without global barriers
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.02 |
Current multiprocessor systems execute parallel and concurrent software nondeterministically: even when given precisely the same input, two executions of the same program may produce different output. This severely complicates debugging, testing, and automatic replication for fault-tolerance. Previous efforts to address this issue have focused primarily on record and replay, but making execution actually deterministic would address the problem at the root. Our goals in this work are twofold: (1) to provide fully deterministic execution of arbitrary, unmodified, multithreaded programs as an OS service; and (2) to make all sources of intentional nondeterminism, such as network I/O, be explicit and controllable. To this end we propose a new OS abstraction, the Deterministic Process Group (DPG). All communication between threads and processes internal to a DPG happens deterministically, including implicit communication via shared-memory accesses, as well as communication via OS channels such as pipes, signals, and the filesystem. To deal with fundamentally nondeterministic external events, our abstraction includes the shim layer, a programmable interface that interposes on all interaction between a DPG and the external world, making determinism useful even for reactive applications. We implemented the DPG abstraction as an extension to Linux and demonstrate its benefits with three use cases: plain deterministic execution; replicated execution; and record and replay by logging just external input. We evaluated our implementation on both parallel and reactive workloads, including Apache, Chromium, and PARSEC.