Instrumentation and sampling strategies for cooperative concurrency bug isolation

Authors:
Guoliang Jin;Aditya Thakur;Ben Liblit;Shan Lu
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Year:
2010

Citing 38
Cited 21

Debugging Parallel Programs with Instant Replay

IEEE Transactions on Computers
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Eraser: a dynamic data race detector for multithreaded programs

ACM Transactions on Computer Systems (TOCS)
Efficient and precise datarace detection for multithreaded object-oriented programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Bug isolation via remote program sampling

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
RacerX: effective, static detection of race conditions and deadlocks

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Atomizer: a dynamic atomicity checker for multithreaded programs

Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Race checking by context inference

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Scalable statistical bug isolation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Applications of synchronization coverage

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
RaceTrack: efficient detection of data race conditions via adaptive tracking

Proceedings of the twentieth ACM symposium on Operating systems principles
Associating synchronization constraints with data in an object-oriented language

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
AVIO: detecting atomicity violations via access interleaving invariants

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatically classifying benign and harmful data races using replay analysis

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Iterative context bounding for systematic testing of multithreaded programs

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Statistical debugging using compound boolean predicates

Proceedings of the 2007 international symposium on Software testing and analysis
Unit testing concurrent software

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Execution replay of multiprocessor virtual machines

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Race directed random testing of concurrent programs

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A calculus of atomic actions

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Detecting and tolerating asymmetric races

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CTrigger: exposing atomicity violation bugs from their hiding places

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
ISOLATOR: dynamically ensuring isolation in comcurrent programs

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Atom-Aid: Detecting and Surviving Atomicity Violations

IEEE Micro
Two hardware-based approaches for deterministic multiprocessor replay

Communications of the ACM - One Laptop Per Child: Vision vs. Reality
FastTrack: efficient and precise dynamic race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
LiteRace: effective sampling for lightweight data-race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
HOLMES: Effective statistical debugging via efficient path profiling

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A case for an interleaving constrained shared-memory multi-processor

Proceedings of the 36th annual international symposium on Computer architecture
Asserting and checking determinism for multithreaded programs

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Grace: safe multithreaded programming for C/C++

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Abstraction-guided synthesis of synchronization

Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast and accurate static data-race detection for concurrent programs

CAV'07 Proceedings of the 19th international conference on Computer aided verification
PACER: proportional detection of data races

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Do I use the wrong definition?: DeFuse: definition-use invariants for detecting concurrency and sequential bugs

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Cooperative crug isolation

WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis

Isolating and understanding concurrency errors using reconstructed execution fragments

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automated atomicity-violation fixing

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Quarantine: fault tolerance for concurrent servers with data-driven selective isolation

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A step towards transparent integration of input-consciousness into dynamic program optimizations

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
The potential of sampling for dynamic analysis

Proceedings of the ACM SIGPLAN 6th Workshop on Programming Languages and Analysis for Security
Fully automatic and precise detection of thread safety violations

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
CARISMA: a context-sensitive approach to race-condition sample-instance selection for multithreaded applications

Proceedings of the 2012 International Symposium on Software Testing and Analysis
Understanding the interleaving-space overlap across inputs and software versions

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Collaborative energy debugging for mobile devices

HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Automated concurrency-bug fixing

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
ConMem: Detecting Crash-Triggering Concurrency Bugs through an Effect-Oriented Approach

ACM Transactions on Software Engineering and Methodology (TOSEM)
Production-run software failure diagnosis via hardware performance counters

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
CLAP: recording local executions to reproduce concurrency failures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Griffin: grouping suspicious memory-access patterns to improve understanding of concurrency bugs

Proceedings of the 2013 International Symposium on Software Testing and Analysis
Fault comprehension for concurrent programs

Proceedings of the 2013 International Conference on Software Engineering
Debugging non-deadlock concurrency bugs

Proceedings of the 2013 International Symposium on Software Testing and Analysis
Efficient concurrency-bug detection across inputs

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Carat: collaborative energy diagnosis for mobile devices

Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems
RaceMob: crowdsourced data race detection

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Leveraging the short-term memory of hardware to diagnose production-run software failures

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under specific execution interleavings that may not arise during in-house testing, thereby demanding a lightweight program monitoring technique that can be used post-deployment. We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by crugs. CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also develop variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments with 9 real-world bugs in 6 non-trivial C applications show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.