Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Debugging concurrent processes: a case study
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
An empirical comparison of monitoring algorithms for access anomaly detection
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hypervisor-based fault tolerance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Online data-race detection via coherency guarantees
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Deterministic replay of Java multithreaded applications
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Finding stale-value errors in concurrent programs: Research Articles
Concurrency and Computation: Practice & Experience
A serializability violation detector for shared-memory server programs
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
CADRE: Cycle-Accurate Deterministic Replay for Hardware Debugging
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Recording shared memory dependencies using strata
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Debugging operating systems with time-traveling virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Automatically classifying benign and harmful data races using replay analysis
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Configuration debugging as search: finding the needle in the haystack
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Execution replay for intrusion analysis
Execution replay for intrusion analysis
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
DejaView: a personal virtual computer recorder
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Better bug reporting with better privacy
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
CTrigger: exposing atomicity violation bugs from their hiding places
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Analyzing multicore dumps to facilitate concurrency bug reproduction
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ConMem: detecting severe concurrency bugs through an effect-oriented approach
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Execution synthesis: a technique for automated software debugging
Proceedings of the 5th European conference on Computer systems
A trace simplification technique for effective debugging of concurrent programs
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
LEAP: lightweight deterministic multi-processor replay of concurrent java programs
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Focus replay debugging effort on the control plane
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
Bypassing races in live applications with execution filters
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Deterministic process groups in dOS
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Stable deterministic multithreading through schedule memoization
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Automating configuration troubleshooting with dynamic information flow analysis
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
InstantCheck: Checking the Determinism of Parallel Programs Using On-the-Fly Incremental Hashing
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Low-overhead bug fingerprinting for fast debugging
RV'10 Proceedings of the First international conference on Runtime verification
Improving software diagnosability via log enhancement
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
DoublePlay: parallelizing sequential logging and replay
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RCDC: a relaxed consistency deterministic computer
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
2ndStrike: toward manifesting hidden concurrency typestate bugs
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
ConSeq: detecting concurrency bugs through sequential errors
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Striking a new balance between program instrumentation and debugging time
Proceedings of the sixth conference on Computer systems
Dependence-based multi-level tracing and replay for wireless sensor networks debugging
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Debug determinism: the sweet spot for replay-based debugging
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Toward generating reducible replay logs
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Isolating and understanding concurrency errors using reconstructed execution fragments
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Record and transplay: partial checkpointing for replay debugging across heterogeneous systems
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Karma: scalable deterministic record-replay
Proceedings of the international conference on Supercomputing
RADBench: a concurrency bug benchmark suite
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
ORDER: object centric deterministic replay for Java
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Record and transplay: partial checkpointing for replay debugging across heterogeneous systems
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Partial replay of long-running applications
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
An efficient static trace simplification technique for debugging concurrent programs
SAS'11 Proceedings of the 18th international conference on Static analysis
Efficient deterministic multithreading through schedule relaxation
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
DoublePlay: Parallelizing Sequential Logging and Replay
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
Improving Software Diagnosability via Log Enhancement
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
A lightweight and portable approach to making concurrent failures reproducible
FASE'10 Proceedings of the 13th international conference on Fundamental Approaches to Software Engineering
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
CoreRacer: a practical memory race recorder for multicore x86 TSO processors
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting parallelism in deterministic shared memory multiprocessing
Journal of Parallel and Distributed Computing
Can deterministic replay be an enabling tool for mobile computing?
Proceedings of the 12th Workshop on Mobile Computing Systems and Applications
Chimera: hybrid program analysis for determinism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
BugRedux: reproducing field failures for in-house debugging
Proceedings of the 34th International Conference on Software Engineering
BALLERINA: automatic generation and clustering of efficient random unit tests for multithreaded code
Proceedings of the 34th International Conference on Software Engineering
Stride: search-based deterministic replay in polynomial time via bounded linkage
Proceedings of the 34th International Conference on Software Engineering
Tracing and recording interrupts in embedded software
Journal of Systems Architecture: the EUROMICRO Journal
LEAN: simplifying concurrency bug reproduction via replay-supported execution reduction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Efficient patch-based auditing for web application vulnerabilities
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Automated concurrency-bug fixing
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
All about Eve: execute-verify replication for multi-core servers
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
DTAM: dynamic taint analysis of multi-threaded programs for relevancy
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
ConMem: Detecting Crash-Triggering Concurrency Bugs through an Effect-Oriented Approach
ACM Transactions on Software Engineering and Methodology (TOSEM)
Scalable deterministic replay in a parallel full-system emulator
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
RaceFree: an efficient multi-threading model for determinism
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Deterministic Replay Using Global Clock
ACM Transactions on Architecture and Code Optimization (TACO)
Transparent mutable replay for multicore debugging and patch validation
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cyrus: unintrusive application-level record-replay for replay parallelism
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
CONCURRIT: a domain specific language for reproducing concurrency bugs
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
CLAP: recording local executions to reproduce concurrency failures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Automated debugging for arbitrarily long executions
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
OCTET: capturing and controlling cross-thread dependences efficiently
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Parrot: a practical runtime for deterministic, stable, and reliable threads
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Towards effective and efficient search-based deterministic replay
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Semi-automated debugging via binary search through a process lifetime
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Synchronization identification through on-the-fly test
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
RelaxReplay: record and replay for relaxed-consistency multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Trace driven dynamic deadlock detection and reproduction
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Infrastructure-free logging and replay of concurrent execution on multiple cores
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Global property violation detection and diagnosis for wireless sensor networks
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions. This paper proposes a novel technique called PRES (probabilistic replay via execution sketching) to help reproduce concurrency bugs on multi-processors. It relaxes the past (perhaps idealistic) objective of "reproducing the bug on the first replay attempt" to significantly lower production-run recording overhead. This is achieved by (1) recording only partial execution information (referred to as "sketches") during the production run, and (2) relying on an intelligent replayer during diagnosis time (when performance is less critical) to systematically explore the unrecorded non-deterministic space and reproduce the bug. With only partial information, our replayer may require more than one coordinated replay run to reproduce a bug. However, after a bug is reproduced once, PRES can reproduce it every time. We implemented PRES along with five different execution sketching mechanisms. We evaluated them with 11 representative applications, including 4 servers, 3 desktop/client applications, and 4 scientific/graphics applications, with 13 real-world concurrency bugs of different types, including atomicity violations, order violations and deadlocks. PRES (with synchronization or system call sketching) significantly lowered the production-run recording overhead of previous approaches (by up to 4416 times), while still reproducing most tested bugs in fewer than 10 replay attempts. Moreover, PRES scaled well with the number of processors; PRES's feedback generation from unsuccessful replays is critical in bug reproduction.