Self-healing multitier architectures using cascading rescue points

Authors:
Angeliki Zavou;Georgios Portokalidis;Angelos D. Keromytis
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY
Venue:
Proceedings of the 28th Annual Computer Security Applications Conference
Year:
2012

Citing 27
Cited 1

Optimal checkpointing and local recording for domino-free rollback recovery

Information Processing Letters
Efficient distributed recovery using message logging

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Debugging with dynamic slicing and backtracking

Software—Practice & Experience
Hypervisor-based fault tolerance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
The design and implementation of Zap: a system for migrating computing environments

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Dynamic software updating

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatically Finding and Patching Bad Error Handling

EDCC '06 Proceedings of the Sixth European Dependable Computing Conference
Debugging operating systems with time-traveling virtual machines

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Building a reactive immune system for software services

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Crash-only software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Enhancing server availability and security through failure-oblivious computing

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Libckpt: transparent checkpointing under Unix

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
How Long Will It Take to Fix This Bug?

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Dynamic and adaptive updates of non-quiescent subsystems in commodity operating system kernels

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Preventing Memory Error Exploits with WIT

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
ASSURE: automatic software self-healing using rescue points

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Automatically patching errors in deployed software

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A few billion lines of code later: using static analysis to find bugs in the real world

Communications of the ACM
An Empirical Analysis of Software Vendors' Patch Release Behavior: Impact of Vulnerability Disclosure

Information Systems Research
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Fast and practical instruction-set randomization for commodity systems

Proceedings of the 26th Annual Computer Security Applications Conference
REASSURE: a self-contained mechanism for healing software using rescue points

IWSEC'11 Proceedings of the 6th International conference on Advances in information and computer security
libdft: practical dynamic data flow tracking for commodity systems

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Smashing the Gadgets: Hindering Return-Oriented Programming Using In-place Code Randomization

SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy

Chronicler: lightweight recording to reproduce field failures

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software bugs and vulnerabilities cause serious problems to both home users and the Internet infrastructure, limiting the availability of Internet services, causing loss of data, and reducing system integrity. Software self-healing using rescue points (RPs) is a known mechanism for recovering from unforeseen errors. However, applying it on multitier architectures can be problematic because certain actions, like transmitting data over the network, cannot be undone. We propose cascading rescue points (CRPs) to address the state inconsistency issues that can arise when using traditional RPs to recover from errors in interconnected applications. With CRPs, when an application executing within a RP transmits data, the remote peer is notified to also perform a checkpoint, so the communicating entities checkpoint in a coordinated, but loosely coupled way. Notifications are also sent when RPs successfully complete execution, and when recovery is initiated, so that the appropriate action is performed by remote parties. We developed a tool that implements CRPs by dynamically instrumenting binaries and transparently injecting notifications in the already established TCP channels between applications. We tested our tool with various applications, including the MySQL and Apache servers, and show that it allows them to successfully recover from errors, while incurring moderate overhead between 4.54% and 71.56%.