ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution

  • Authors:
  • Wei Zhang;Marc de Kruijf;Ang Li;Shan Lu;Karthikeyan Sankaralingam

  • Affiliations:
  • Computer Sciences Department, University of Wisconsin--Madison, Madison, WI, USA;Computer Sciences Department, University of Wisconsin--Madison/ Google, Inc, Madison, WI, USA;Computer Sciences Department, University of Wisconsin--Madison, Madison, WI, USA;Computer Sciences Department, University of Wisconsin--Madison, Madison, WI, USA;Computer Sciences Department, University of Wisconsin--Madison, Madison, WI, USA

  • Venue:
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software survive hidden concurrency bugs during production runs. To help developers, we need techniques that fix exposed concurrency bugs. The state-of-the-art techniques on concurrency-bug fixing and survival only satisfy a subset of four important properties: compatibility, correctness, generality, and performance.We aim to develop a system that satisfies all of these four properties. To achieve this goal, we leverage two observations: (1) rolling back a single thread is sufficient to recover from most concurrency-bug failures; (2) reexecuting an idempotent region, which requires no memory-state checkpoint, is sufficient to recover from many concurrency-bug failures. Our system ConAir includes a static analysis component that automatically identifies potential failure sites, a static analysis component that automatically identifies the idempotent code regions around every failure site, and a code-transformation component that inserts rollback-recovery code around the identified idempotent regions. We evaluated ConAir on 10 real-world concurrency bugs in widely used C/C++ open-source applications. These bugs cover different types of failure symptoms and root causes. Quantitatively, ConAir helps software survive failures caused by all of these bugs with negligible run-time overhead (