Self-recovery in server programs

Authors:
Vijay Nagarajan;Dennis Jeffrey;Rajiv Gupta
Affiliations:
University of California, Riverside, Riverside, CA, USA;University of California, Riverside, Riverside, CA, USA;University of California, Riverside, Riverside, CA, USA
Venue:
Proceedings of the 2009 international symposium on Memory management
Year:
2009

Citing 27
Cited 1

Diskless Checkpointing

IEEE Transactions on Parallel and Distributed Systems
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Reliability Issues in Computing System Design

ACM Computing Surveys (CSUR)
ROC-1: Hardware Support for Recovery-Oriented Computing

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Transient-fault recovery using simultaneous multithreading

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An infrastructure for adaptive dynamic optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Recovery Oriented Computing: A New Research Agenda for a New Century

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving the reliability of commodity operating systems

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Secure program execution via dynamic information flow tracking

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Soft Error Problem: An Architectural Perspective

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
TaintTrace: Efficient Flow Tracing with Dynamic Binary Rewriting

ISCC '06 Proceedings of the 11th IEEE Symposium on Computers and Communications
Dynamic slicing long running programs through execution fast forwarding

Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Minos: Architectural support for protecting control data

ACM Transactions on Architecture and Code Optimization (TACO)
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Building a reactive immune system for software services

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Raksha: a flexible information flow architecture for software security

Proceedings of the 34th annual international symposium on Computer architecture
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Recovering device drivers

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection

Proceedings of the International Symposium on Code Generation and Optimization
How to shadow every byte of memory used by a program

Proceedings of the 3rd international conference on Virtual execution environments
Avoiding Program Failures Through Safe Execution Perturbations

COMPSAC '08 Proceedings of the 2008 32nd Annual IEEE International Computer Software and Applications Conference
Architectural support for shadow memory in multiprocessors

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Detecting and escaping infinite loops with jolt

Proceedings of the 25th European conference on Object-oriented programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is important that long running server programs retain availability amidst software failures. However, server programs do fail and one of the important causes of failures in server programs is due to memory errors. Software bugs in the server code like buffer overflows, integer overflows, etc. are exposed by certain user requests, leading to memory corruption, which can often result in crashes. One safe way of recovering from these crashes is to periodically checkpoint program state and rollback to the most recent checkpoint on a crash. However, checkpointing program state periodically can be quite expensive. Furthermore, since recovery can involve the rolling back of considerable state information in addition to replay of several benign user requests, the throughput and response time of the server can be reduced significantly during rollback recovery. In this paper, we first conducted a detailed study to see how memory corruption propagates in server programs. Our study shows that memory locations that are corrupted during the processing of an user request, generally do not propagate across user requests. On the contrary, the memory locations that are corrupted are generally cleansed automatically, as memory (stack or the heap) gets deallocated or when memory gets overwritten with uncorrupted values. This self cleansing property in server programs led us to believe that recovering from crashes does not necessarily require the expensive roll back of state for recovery. Motivated by this observation, we propose SRS, a technique for self recovery in server programs which takes advantage of self-cleansing to recover from crashes. Those memory locations that are not fully cleansed are restored in a demand driven fashion, which makes SRS very efficient. Thus in SRS, when a crash occurs instead of rolling back to a safe state, the crash is suppressed and the program is made to execute forwards past the crash; we employ a mechanism called crash suppression, to prevent further crashes from recurring as the execution proceeds forwards. Experiments conducted on real world server programs with real bugs, show that in each of the cases the server program could efficiently recover from the crash and the faulty user request was isolated from future benign user requests.