A lightweight and portable approach to making concurrent failures reproducible

  • Authors:
  • Qingzhou Luo;Sai Zhang;Jianjun Zhao;Min Hu

  • Affiliations:
  • School of Software, Shanghai Jiao Tong University;Computer Science S Engineering Department, University of Washington;School of Software, Shanghai Jiao Tong University;School of Software, Shanghai Jiao Tong University

  • Venue:
  • FASE'10 Proceedings of the 13th international conference on Fundamental Approaches to Software Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Concurrent programs often exhibit bugs due to unintended interferences among the concurrent threads. Such bugs are often hard to reproduce because they typically happen under very specific interleaving of the executing threads. Basically, it is very hard to fix a bug (or software failure) in concurrent programs without being able to reproduce it. In this paper, we present an approach, called ConCrash, that automatically and deterministically reproduces concurrent failures by recording logical thread schedule and generating unit tests. For a given bug (failure), ConCrash records the logical thread scheduling order and preserves object states in memory at runtime. Then, ConCrash reproduces the failure offline by simply using the saved information without the need for JVM-level or OS-level support. To reduce the runtime performance overhead, ConCrash employs a static data race detection technique to report potential possible race conditions, and only instruments such places. We implement the ConCrash approach in a prototype tool for Java and experimented on a number of multi-threaded Java benchmarks. As a result, we successfully reproduced a number of real concurrent bugs (e.g., deadlocks, data races and atomicity violation) within an acceptable overhead.