Pervasive detection of process races in deployed systems

Authors:
Oren Laadan;Nicolas Viennot;Chia-Che Tsai;Chris Blinn;Junfeng Yang;Jason Nieh
Affiliations:
Columbia University;Columbia University;Columbia University;Columbia University;Columbia University;Columbia University
Venue:
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Year:
2011

Citing 32
Cited 3

Debugging Parallel Programs with Instant Replay

IEEE Transactions on Computers
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
RacerX: effective, static detection of race conditions and deadlocks

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dynamic partial-order reduction for model checking software

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The design and implementation of Zap: a system for migrating computing environments

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
RaceTrack: efficient detection of data race conditions via adaptive tracking

Proceedings of the twentieth ACM symposium on Operating systems principles
Effective static race detection for Java

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
AVIO: detecting atomicity violations via access interleaving invariants

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Automatically classifying benign and harmful data races using replay analysis

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
TOCTTOU vulnerabilities in UNIX-style file systems: an anatomical study

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Dynamic detection and prevention of race conditions in file accesses

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Execution replay of multiprocessor virtual machines

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Parallelizing security checks on commodity hardware

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Portably solving file TOCTTOU races with hardness amplification

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Race directed random testing of concurrent programs

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Distributed In Vivo Testing of Software Applications

ICST '08 Proceedings of the 2008 International Conference on Software Testing, Verification, and Validation
Decoupling dynamic program analysis from execution in virtual environments

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Operating System Transactions

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Transparent, lightweight application execution replay on commodity multiprocessor operating systems

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
R2: an application-level kernel for record and replay

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Bypassing races in live applications with execution filters

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Deterministic process groups in dOS

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Stable deterministic multithreading through schedule memoization

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
2ndStrike: toward manifesting hidden concurrency typestate bugs

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
ConSeq: detecting concurrency bugs through sequential errors

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Finding complex concurrency bugs in large multi-threaded applications

Proceedings of the sixth conference on Computer systems
Finding concurrency errors in sequential code: OS-level, in-vivo model checking of process races

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems

Transparent mutable replay for multicore debugging and patch validation

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
SimRacer: an automated framework to support testing for process-level races

Proceedings of the 2013 International Symposium on Software Testing and Analysis
An observable and controllable testing framework for modern systems

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we created RacePro, a system for automatically detecting these races. RacePro checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RacePro records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RacePro models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RacePro uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RacePro in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races.