Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Recovery management in QuickSilver
ACM Transactions on Computer Systems (TOCS)
Distributed programming in Argus
Communications of the ACM
Debugging concurrent processes: a case study
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
ACM Transactions on Computer Systems (TOCS)
Recovery in distributed systems using asynchronous message logging and checkpointing
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
On-the-fly detection of access anomalies
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Real-time, concurrent checkpoint for parallel programs
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
An efficient cache-based access anomaly detection scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Race Frontier: reproducing data races in parallel-program debugging
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting violations of sequential consistency
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Detecting data races on weak memory systems
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
LCLint: a tool for using specifications to check code
SIGSOFT '94 Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering
eNVy: a non-volatile, main memory storage system
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Replay for concurrent non-deterministic shared-memory applications
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
The Rio file cache: surviving operating system crashes
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dealing with disaster: surviving misbehaved kernel extensions
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Trade-offs in implementing causal message logging protocols
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Free transactions with Rio Vista
Proceedings of the sixteenth ACM symposium on Operating systems principles
Progressive Retry for Software Failure Recovery in Message-Passing Applications
IEEE Transactions on Computers
Deterministic replay of Java multithreaded applications
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
IEEE Transactions on Parallel and Distributed Systems
Fast cluster failover using virtual memory-mapped communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
Blueprints for high availability: designing resilient distributed systems
Blueprints for high availability: designing resilient distributed systems
Type-based race detection for Java
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A Protocol-Centric Approach to on-the-Fly Race Detection
IEEE Transactions on Parallel and Distributed Systems
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
CCured: type-safe retrofitting of legacy code
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
CLIP: a checkpointing tool for message-passing parallel programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A system and language for building system-specific, static analyses
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Extended static checking for Java
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Tracking down software bugs using automatic anomaly detection
Proceedings of the 24th International Conference on Software Engineering
Ownership types for safe programming: preventing data races and deadlocks
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Elimination of Java array bounds checks in the presence of indirection
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Using redundancies to find errors
Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
Debugging via Run-Time Type Checking
FASE '01 Proceedings of the 4th International Conference on Fundamental Approaches to Software Engineering
Data Replication Strategies for Fault Tolerance and Availability on Commodity Clusters
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Reducing Recovery Time in a Small Recursively Restartable System
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Integrating Checkpointing with Transaction Processing
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
A message system supporting fault tolerance
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
How to recover efficiently and asynchronously when optimism fails
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Proceedings of the 30th annual international symposium on Computer architecture
Completely Asynchronous Optimistic Recovery with Minimal Rollbacks
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Why Optimistic Message Logging Has Not Been Used in Telecommunications Systems
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Intrusion Detection via Static Analysis
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Using model checking to debug device firmware
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Proactive recovery in a Byzantine-fault-tolerant system
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
A transactional memory service in an extensible operating system
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Jockey: a user-space library for record-replay debugging
Proceedings of the sixth international symposium on Automated analysis-driven debugging
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
Rx: treating bugs as allergies---a safe method to survive software failures
Proceedings of the twentieth ACM symposium on Operating systems principles
Automatic diagnosis and response to memory corruption vulnerabilities
Proceedings of the 12th ACM conference on Computer and communications security
Selective early request termination for busy internet services
Proceedings of the 15th international conference on World Wide Web
S2DB: a novel simulation-based debugger for sensor network applications
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Replayer: automatic protocol replay by binary analysis
Proceedings of the 13th ACM conference on Computer and communications security
ExecRecorder: VM-based full-system replay for attack analysis and system recovery
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
Speculative execution in a distributed file system
ACM Transactions on Computer Systems (TOCS)
Debugging operating systems with time-traveling virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
A Technique for Enabling and Supporting Debugging of Field Failures
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Correlating multi-session attacks via replay
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Automatic on-line failure diagnosis at the end-user site
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Treating bugs as allergies: a safe method for surviving software failures
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Replay debugging for distributed applications
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Sweeper: a lightweight end-to-end system for defending against fast worms
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Enabling tracing Of long-running multithreaded programs via dynamic execution reduction
Proceedings of the 2007 international symposium on Software testing and analysis
Rx: Treating bugs as allergies—a safe method to survive software failures
ACM Transactions on Computer Systems (TOCS)
Efficient checkpointing of java software using context-sensitive capture and replay
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Staged deployment in mirage, an integrated software upgrade testing and distribution system
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Simulation-based augmented reality for sensor network development
Proceedings of the 5th international conference on Embedded networked sensor systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Relaxed determinism: making redundant execution on multiprocessors practical
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Transparent checkpoint-restart of multiple processes on commodity operating systems
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Decoupling dynamic program analysis from execution in virtual environments
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
ReCrash: Making Software Failures Reproducible by Preserving Object States
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Fast and Black-box Exploit Detection and Signature Generation for Commodity Software
ACM Transactions on Information and System Security (TISSEC)
Using virtual machines to do cross-layer damage assessment
Proceedings of the 1st ACM workshop on Virtual machine security
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
First-aid: surviving and preventing memory management bugs during production runs
Proceedings of the 4th ACM European conference on Computer systems
Transparent checkpoints of closed distributed systems in Emulab
Proceedings of the 4th ACM European conference on Computer systems
A systematic approach to system state restoration during storage controller micro-recovery
FAST '09 Proccedings of the 7th conference on File and storage technologies
RT-replayer: a record-replay architecture for embedded real-time software debugging
Proceedings of the 2009 ACM symposium on Applied Computing
FlashBox: a system for logging non-deterministic events in deployed embedded systems
Proceedings of the 2009 ACM symposium on Applied Computing
Proceedings of the 4th International Symposium on Information, Computer, and Communications Security
Self-recovery in server programs
Proceedings of the 2009 international symposium on Memory management
SigRace: signature-based data race detection
Proceedings of the 36th annual international symposium on Computer architecture
ECMon: exposing cache events for monitoring
Proceedings of the 36th annual international symposium on Computer architecture
MODIST: transparent model checking of unmodified distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Software Profiling for Deterministic Replay Debugging of User Code
Proceedings of the 2006 conference on New Trends in Software Methodologies, Tools and Techniques: Proceedings of the fifth SoMeT_06
ReCrashJ: a tool for capturing and reproducing program crashes in deployed applications
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Availability-sensitive intrusion recovery
Proceedings of the 1st ACM workshop on Virtual machine security
DSF: a common platform for distributed systems research and development
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Offline symbolic analysis for multi-processor execution replay
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
Multi-stage replay with crosscut
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Analyzing multicore dumps to facilitate concurrency bug reproduction
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Otherworld: giving applications a chance to survive OS kernel crashes
Proceedings of the 5th European conference on Computer systems
Alhambra: a system for creating, enforcing, and testing browser security policies
Proceedings of the 19th international conference on World wide web
PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Prospect: a compiler framework for speculative parallelization
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Detection and diagnosis of control interception
ICICS'07 Proceedings of the 9th international conference on Information and communications security
An empirical study of reported bugs in server software with implications for automated bug diagnosis
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DSF: a common platform for distributed systems research and development
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
"Otherworld": giving applications a chance to survive OS kernel crashes
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Testing closed-source binary device drivers with DDT
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Language-based replay via data flow cut
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Stable deterministic multithreading through schedule memoization
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
DoublePlay: parallelizing sequential logging and replay
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
2ndStrike: toward manifesting hidden concurrency typestate bugs
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Correlating multi-session attacks via replay
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
Automatic on-line failure diagnosis at the end-user site
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Record and transplay: partial checkpointing for replay debugging across heterogeneous systems
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Record and transplay: partial checkpointing for replay debugging across heterogeneous systems
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Locating failure-inducing environment changes
Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools
Partial replay of long-running applications
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Summarized trace indexing and querying for scalable back-in-time debugging
Proceedings of the 25th European conference on Object-oriented programming
BLR-D: applying bilinear logistic regression to factored diagnosis problems
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
URDB: a universal reversible debugger based on decomposing debugging histories
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
Efficient deterministic multithreading through schedule relaxation
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Pervasive detection of process races in deployed systems
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
BLR-D: applying bilinear logistic regression to factored diagnosis problems
ACM SIGOPS Operating Systems Review
DoublePlay: Parallelizing Sequential Logging and Replay
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
CoreRacer: a practical memory race recorder for multicore x86 TSO processors
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Can deterministic replay be an enabling tool for mobile computing?
Proceedings of the 12th Workshop on Mobile Computing Systems and Applications
Chimera: hybrid program analysis for determinism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Stride: search-based deterministic replay in polynomial time via bounded linkage
Proceedings of the 34th International Conference on Software Engineering
CCTR: An efficient point-to-point memory race recorder implemented in chunks
Microprocessors & Microsystems
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Energy-efficient and high-performance software architecture for storage class memory
ACM Transactions on Embedded Computing Systems (TECS)
Transparent mutable replay for multicore debugging and patch validation
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cyrus: unintrusive application-level record-replay for replay parallelism
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
An efficient deterministic record-replay with separate dependencies
Computers and Electrical Engineering
Proceedings of the 40th Annual International Symposium on Computer Architecture
RERAN: timing- and touch-sensitive record and replay for Android
Proceedings of the 2013 International Conference on Software Engineering
Chronicler: lightweight recording to reproduce field failures
Proceedings of the 2013 International Conference on Software Engineering
Safe software updates via multi-version execution
Proceedings of the 2013 International Conference on Software Engineering
Value-deterministic search-based replay for android multithreaded applications
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Semi-automated debugging via binary search through a process lifetime
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
DEFINED: deterministic execution for interactive control-plane debugging
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
RelaxReplay: record and replay for relaxed-consistency multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Enabling bring-your-own-device using mobile application instrumentation
IBM Journal of Research and Development
Hi-index | 0.00 |
Software robustness has significant impact on system availability. Unfortunately, finding software bugs is a very challenging task because many bugs are hard to reproduce. While debugging a program, it would be very useful to rollback a crashed program to a previous execution point and deterministically re-execute the "buggy" code region. However, most previous work on rollback and replay support was designed to survive hardware or operating system failures, and is therefore too heavyweight for the fine-grained rollback and replay needed for software debugging. This paper presents Flashback, a lightweight OS extension that provides fine-grained rollback and replay to help debug software. Flashback uses shadow processes to efficiently roll back in-memory state of a process, and logs a process' interactions with the system to support deterministic replay. Both shadow processes and logging of system calls are implemented in a lightweight fashion specifically designed for the purpose of software debugging. We have implemented a prototype of Flashback in the Linux operating system. Our experimental results with micro-benchmarks and real applications show that Flashback adds little overhead and can quickly roll back a debugged program to a previous execution point and deterministically replay from that point.