Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
A software instruction counter
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
IGOR: a system for program debugging via reversible execution
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Hardware-assisted replay of multiprocessor programs
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging shared-memory parallel programs
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Replay for concurrent non-deterministic shared-memory applications
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
Deciding when to forget in the Elephant file system
Proceedings of the seventeenth ACM symposium on Operating systems principles
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Enhancing software reliability with speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Perturbation-Free Replay Platform for Cross-Optimized Multithreaded Applications
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Ext3cow: a time-shifting file system for regulatory compliance
ACM Transactions on Storage (TOS)
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Recording shared memory dependencies using strata
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Framework for instruction-level tracing and analysis of program executions
Proceedings of the 2nd international conference on Virtual execution environments
Debugging operating systems with time-traveling virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Parallelizing security checks on commodity hardware
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Decoupling dynamic program analysis from execution in virtual environments
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Grace: safe multithreaded programming for C/C++
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
A type and effect system for deterministic parallel Java
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Offline symbolic analysis for multi-processor execution replay
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Analyzing multicore dumps to facilitate concurrency bug reproduction
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Execution synthesis: a technique for automated software debugging
Proceedings of the 5th European conference on Computer systems
Prospect: a compiler framework for speculative parallelization
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 37th annual international symposium on Computer architecture
quFiles: the right file at the right time
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Finding and reproducing Heisenbugs in concurrent programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Deterministic process groups in dOS
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Detecting and surviving data races using complementary schedules
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order of or the values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15% with two worker threads and 28% with four threads.