First-aid: surviving and preventing memory management bugs during production runs

Authors:
Qi Gao;Wenbin Zhang;Yan Tang;Feng Qin
Affiliations:
Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 4th ACM European conference on Computer systems
Year:
2009

Citing 32
Cited 3

Programmers use slices when debugging

Communications of the ACM
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Isolating cause-effect chains from computer programs

Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
Windows of Vulnerability: A Case Study Analysis

Computer
An Execution-Backtracking Approach to Debugging

IEEE Software
Reducing Recovery Time in a Small Recursively Restartable System

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Precise dynamic slicing algorithms

Proceedings of the 25th International Conference on Software Engineering
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
The design and implementation of Zap: a system for migrating computing environments

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
DieHard: probabilistic memory safety for unsafe languages

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
HDD: hierarchical delta debugging

Proceedings of the 28th international conference on Software engineering
Debugging operating systems with time-traveling virtual machines

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Building a reactive immune system for software services

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Exterminator: automatically correcting memory errors with high probability

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Libckpt: transparent checkpointing under Unix

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Creating Vulnerability Signatures Using Weakest Preconditions

CSF '07 Proceedings of the 20th IEEE Computer Security Foundations Symposium
Sweeper: a lightweight end-to-end system for defending against fast worms

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Bouncer: securing software by blocking bad input

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Triage: diagnosing production run failures at the user's site

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Archipelago: trading address space for reliability and security

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Remus: high availability via asynchronous virtual machine replication

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Tolerating memory leaks

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Deadlock immunity: enabling systems to defend against deadlocks

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Floguard: cost-aware systemwide intrusion defense via online forensics and on-demand IDS deployment

SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Mitigating program security vulnerabilities: Approaches and challenges

ACM Computing Surveys (CSUR)
CloudER: a framework for automatic software vulnerability location and patching in the cloud

Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory bugs in C/C++ programs severely affect system availability and security. This paper presents First-Aid, a lightweight runtime system that survives software failures caused by common memory management bugs and prevents future failures by the same bugs during production runs. Upon a failure, First-Aid diagnoses the bug type and identifies the memory objects that trigger the bug. To do so, it rolls back the programto previous checkpoints and uses two types of environmental changes that can prevent or expose memory bug manifestation during re-execution. Based on the diagnosis, First-Aid generates and applies runtime patches to avoid the memory bug and prevent its reoccurrence. Furthermore, First-Aid validates the consistent effects of the runtime patches and generates on-site diagnostic reports to assist developers in fixing the bugs. We have implemented First-Aid on Linux and evaluated it with seven applications that contain various types of memory bugs, including buffer overflow, uninitialized read, dangling pointer read/write, and double free. The results show that First-Aid can quickly diagnose the tested bugs and recover applications from failures (in 0.084 to 3.978 seconds). The results also show that the runtime patches generated by First-Aid can prevent future failures caused by the diagnosed bugs. Additionally, First-Aid provides detailed diagnostic information on both the root cause and the manifestation of the bugs. Furthermore, First-Aid incurs low overhead (0.4-11.6% with an average of 3.7%) during normal execution for the tested buggy applications, SPEC INT2000, and four allocation intensive programs.