Life, death, and the critical transition: finding liveness bugs in systems code

Authors:
Charles Killian;James W. Anderson;Ranjit Jhala;Amin Vahdat
Affiliations:
University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego
Venue:
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Year:
2007

Citing 27
Cited 47

The Model Checker SPIN

IEEE Transactions on Software Engineering - Special issue on formal methods in software practice
Model checking for programming languages using VeriSoft

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Bandera: extracting finite-state models from Java source code

Proceedings of the 22nd international conference on Software engineering
A methodology for hardware verification using compositional model checking

Science of Computer Programming - Special issue on mathematics of program construction
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
The SLAM project: debugging system software via static analysis

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
MOPS: an infrastructure for examining security properties of software

Proceedings of the 9th ACM conference on Computer and communications security
From symptom to cause: localizing errors in counterexample traces

POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
Protocol Verification as a Hardware Design Aid

ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
Logic Verification of ANSI-C Code with SPIN

Proceedings of the 7th International SPIN Workshop on SPIN Model Checking and Software Verification
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Specification and verification of concurrent systems in CESAR

Proceedings of the 5th Colloquium on International Symposium on Programming
Construction of Abstract State Graphs with PVS

CAV '97 Proceedings of the 9th International Conference on Computer Aided Verification
MOCHA: Modularity in Model Checking

CAV '98 Proceedings of the 10th International Conference on Computer Aided Verification
Design and Synthesis of Synchronization Skeletons Using Branching-Time Temporal Logic

Logic of Programs, Workshop
Bullet: high bandwidth data dissemination using an overlay mesh

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Scalable error detection using boolean satisfiability

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
CMC: a pragmatic approach to model checking real code

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Termination proofs for systems code

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Mace: language support for building distributed systems

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Model checking large network protocol implementations

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using model checking to find serious file system errors

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using random subsets to build scalable network services

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
What went wrong: explaining counterexamples

SPIN'03 Proceedings of the 10th international conference on Model checking software
WiDS checker: combating bugs in distributed systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

Mace: language support for building distributed systems

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Fair stateless model checking

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
D3S: debugging deployed distributed systems

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Improving scalability and fault tolerance in an application management infrastructure

LASCO'08 First USENIX Workshop on Large-Scale Computing
Declarative Network Verification

PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
Transparent checkpoints of closed distributed systems in Emulab

Proceedings of the 4th ACM European conference on Computer systems
Live Debugging of Distributed Systems

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
MODIST: transparent model checking of unmodified distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Cardinality Abstraction for Declarative Networking Applications

CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
Software model checking

ACM Computing Surveys (CSUR)
Upright cluster services

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Unit Testing of Flash Memory Device Driver through a SAT-Based Model Checker

ASE '08 Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering
On the declarativity of declarative networking

ACM SIGOPS Operating Systems Review
Predicting and preventing inconsistencies in deployed distributed systems

ACM Transactions on Computer Systems (TOCS)
T-check: bug finding for sensor networks

Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks
Applying prolog to develop distributed systems

Theory and Practice of Logic Programming
Finding and reproducing Heisenbugs in concurrent programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Fault prediction in distributed systems gone wild

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Finding latent performance bugs in systems implementations

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Language-based replay via data flow cut

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Correctness of sensor network applications by software bounded model checking

FMICS'10 Proceedings of the 15th international conference on Formal methods for industrial critical systems
Towards automatically checking thousands of failures with micro-specifications

HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
dBug: systematic evaluation of distributed systems

SSV'10 Proceedings of the 5th international conference on Systems software verification
Model checking a networked system without the network

Proceedings of the 8th USENIX conference on Networked systems design and implementation
FATE and DESTINI: a framework for cloud recovery testing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
WiDS checker: combating bugs in distributed systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Counter example-based error localization of behavior models

NFM'11 Proceedings of the Third international conference on NASA Formal methods
Lazy preemption to enable path-based analysis of interrupt-driven code

Proceedings of the 2nd Workshop on Software Engineering for Sensor Network Applications
InContext: simple parallelism for distributed applications

Proceedings of the 20th international symposium on High performance distributed computing
Finding protocol manipulation attacks

Proceedings of the ACM SIGCOMM 2011 conference
dBug: systematic testing of unmodified distributed and multi-threaded systems

Proceedings of the 18th international SPIN conference on Model checking software
ALIAS: scalable, decentralized label assignment for data centers

Proceedings of the 2nd ACM Symposium on Cloud Computing
Practical software model checking via dynamic interface reduction

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Efficient Testing of Recovery Code Using Fault Injection

ACM Transactions on Computer Systems (TOCS)
Using lightweight modeling to understand chord

ACM SIGCOMM Computer Communication Review
A NICE way to test openflow applications

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Structured comparative analysis of systems logs to diagnose performance problems

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Programming model support for dependable, elastic cloud applications

HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Verifying systems rules using rule-directed symbolic execution

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Parrot: a practical runtime for deterministic, stable, and reliable threads

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications

Proceedings of the 4th annual Symposium on Cloud Computing
Aspen trees: balancing data center fault tolerance, scalability and cost

Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Global property violation detection and diagnosis for wireless sensor networks

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Software dataplane verification

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern software model checkers find safety violations: breaches where the system enters some bad state. However, we argue that checking liveness properties offers both a richer and more natural way to search for errors, particularly in complex concurrent and distributed systems. Liveness properties specify desirable system behaviors which must be satisfied eventually, but are not always satisfied, perhaps as a result of failure or during system initialization. Existing software model checkers cannot verify liveness because doing so requires finding an infinite execution that does not satisfy a liveness property. We present heuristics to find a large class of liveness violations and the critical transition of the execution. The critical transition is the step in an execution that moves the system from a state that does not currently satisfy some liveness property--but where recovery is possible in the future--to a dead state that can never achieve the liveness property. Our software model checker, MACEMC, isolates complex liveness errors in our implementations of PASTRY, CHORD, a reliable transport protocol, and an overlay tree.