Finding complex concurrency bugs in large multi-threaded applications

Authors:
Pedro Fonseca;Cheng Li;Rodrigo Rodrigues
Affiliations:
Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern and Saarbrücken, Germany;Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern and Saarbrücken, Germany;Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern and Saarbrücken, Germany
Venue:
Proceedings of the sixth conference on Computer systems
Year:
2011

Citing 28
Cited 7

Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Ownership types for safe programming: preventing data races and deadlocks

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
RacerX: effective, static detection of race conditions and deadlocks

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A serializability violation detector for shared-memory server programs

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Associating synchronization constraints with data in an object-oriented language

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Effective static race detection for Java

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Producing scheduling that causes concurrent programs to fail

Proceedings of the 2006 workshop on Parallel and distributed systems: testing and debugging
AVIO: detecting atomicity violations via access interleaving invariants

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Towards a framework and a benchmark for testing tools for multi-threaded programs: Research Articles

Concurrency and Computation: Practice & Experience - Parallel and Distributed Systems: Testing and Debugging (PADTAD)
Automatically classifying benign and harmful data races using replay analysis

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Model checking large network protocol implementations

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
An improved construction for counting bloom filters

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Generating targeted queries for database testing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
EXE: Automatically Generating Inputs of Death

ACM Transactions on Information and System Security (TISSEC)
FastTrack: efficient and precise dynamic race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
LiteRace: effective sampling for lightweight data-race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Effective static deadlock detection

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Experience with Model Checking Linearizability

Proceedings of the 16th International SPIN Workshop on Model Checking Software
Asserting and checking determinism for multithreaded programs

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
A randomized scheduler with probabilistic guarantees of finding bugs

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Line-up: a complete and automatic linearizability checker

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Effective data-race detection for the kernel

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Ad hoc synchronization considered harmful

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Automatically proving linearizability

CAV'10 Proceedings of the 22nd international conference on Computer Aided Verification

Efficient deterministic multithreading through schedule relaxation

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Pervasive detection of process races in deployed systems

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Detecting and surviving data races using complementary schedules

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Data races vs. data race bugs: telling the difference with portend

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Fully automatic and precise detection of thread safety violations

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Model checking database applications

TACAS'13 Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Back to the future: fault-tolerant live update with time-traveling state transfer

LISA'13 Proceedings of the 27th international conference on Large Installation System Administration

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel software is increasingly necessary to take advantage of multi-core architectures, but it is also prone to concurrency bugs which are particularly hard to avoid, find, and fix, since their occurrence depends on specific thread interleavings. In this paper we propose a concurrency bug detector that automatically identifies when an execution of a program triggers a concurrency bug. Unlike previous concurrency bug detectors, we are able to find two particularly hard classes of bugs. The first are bugs that manifest themselves by subtle violation of application semantics, such as returning an incorrect result. The second are latent bugs, which silently corrupt internal data structures, and are especially hard to detect because when these bugs are triggered they do not become immediately visible. Pike detects these concurrency bugs by checking both the output and the internal state of the application for linearizability at the level of user requests. This paper presents this technique for finding concurrency bugs, its application in the context of a testing tool that systematically searches for such problems, and our experience in applying our approach to MySQL, a large-scale complex multi-threaded application. We were able to find several concurrency bugs in a stable version of the application, including subtle violations of application semantics, latent bugs, and incorrect error replies.