ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Database Systems (TODS)
Managing update conflicts in Bayou, a weakly connected replicated storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Timewarp: techniques for autonomous collaboration
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Flexible conflict detection and management in collaborative applications
Proceedings of the 10th annual ACM symposium on User interface software and technology
The IceCube approach to the reconciliation of divergent replicas
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ACM Transactions on Computer Systems (TOCS)
Revocation of Unread E-mail in an Untrusted Network
ACISP '97 Proceedings of the Second Australasian Conference on Information Security and Privacy
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Rewind, repair, replay: three R's to dependability
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Finding and preventing run-time error handling mistakes
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Oops! Coping with Human Error in IT Systems
Queue - System Failures
A New Undo Function for Web-Based Management Information Systems
IEEE Internet Computing
Detecting past and present intrusions through vulnerability-specific predicates
Proceedings of the twentieth ACM symposium on Operating systems principles
The taser intrusion recovery system
Proceedings of the twentieth ACM symposium on Operating systems principles
AOSD for internet service clusters: the case of availability
AOMD '05 Proceedings of the 1st workshop on Aspect oriented middleware development
HANet: a framework toward ultimately reliable network services
Journal of Systems and Software
Undo for anyone, anywhere, anytime
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Using time travel to diagnose computer problems
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Doppelganger: Better browser privacy without the bother
Proceedings of the 13th ACM conference on Computer and communications security
Automatic high-performance reconstruction and recovery
Computer Networks: The International Journal of Computer and Telecommunications Networking
Correlating multi-session attacks via replay
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Development tools for distributed applications
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Understanding and dealing with operator mistakes in internet services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Configuration debugging as search: finding the needle in the haystack
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Kernel support for zero-loss Internet service restart
Software—Practice & Experience
AutoBash: improving configuration management with operating system causality analysis
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Improving file system reliability with I/O shepherding
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Exceptional situations and program reliability
ACM Transactions on Programming Languages and Systems (TOPLAS)
Virtual machine time travel using continuous data protection and checkpointing
ACM SIGOPS Operating Systems Review
Using causality to diagnose configuration bugs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Alcatraz: An Isolated Environment for Experimenting with Untrusted Software
ACM Transactions on Information and System Security (TISSEC)
Network-Wide Rollback Scheme for Fast Recovery from Operator Errors Toward Dependable Network
APNOMS '08 Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management
Modular data centers: how to design them?
Proceedings of the 1st ACM workshop on Large-Scale system and application performance
Usable autonomic computing systems: The system administrators' perspective
Advanced Engineering Informatics
Proposal on network-wide rollback scheme for fast recovery from operator errors
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Toward quantifying system manageability
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Intrusion recovery using selective re-execution
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Automating configuration troubleshooting with dynamic information flow analysis
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Correlating multi-session attacks via replay
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
An empirical study on configuration errors in commercial and open source systems
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Exception-Handling bugs in java and a language extension to avoid them
Advanced Topics in Exception Handling Techniques
Bringing usability concerns to the design of software architecture
EHCI-DSVIS'04 Proceedings of the 2004 international conference on Engineering Human Computer Interaction and Interactive Systems
Using logical data protection and recovery to improve data availability
ISAS'05 Proceedings of the Second international conference on Service Availability
A reversible abstract machine and its space overhead
FMOODS'12/FORTE'12 Proceedings of the 14th joint IFIP WG 6.1 international conference and Proceedings of the 32nd IFIP WG 6.1 international conference on Formal Techniques for Distributed Systems
Stitch: A language for architecture-based self-adaptation
Journal of Systems and Software
Efficient patch-based auditing for web application vulnerabilities
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 0.02 |
System operators play a critical role in maintaining server dependability yet lack powerful tools to help them do so. To help address this unfulfilled need, we describe Operator Undo, a tool that provides a forgiving operations environment by allowing operators to recover from their own mistakes, from unanticipated software problems, and from intentional or accidental data corruption. Operator Undo starts by intercepting and logging user interactions with a network service before they enter the system, creating a record of user intent. During an undo cycle, all system hard state is physically rewound, allowing the operator to perform arbitrary repairs; after repairs are complete, lost user data is reintegrated into the repaired system by replaying the logged user interactions while tracking and compensating for any resulting externally-visible inconsistencies. We describe the design and implementation of an application-neutral framework for Operator Undo, and detail the process by which we instantiated the framework in the form of an undo-capable e-mail store supporting SMTP mail delivery and IMAP mail retrieval. Our proof-of-concept e-mail implementation imposes only a small performance overhead, and can store days or weeks of recovery log on a single disk.