MODIST: transparent model checking of unmodified distributed systems

Authors:
Junfeng Yang;Tisheng Chen;Ming Wu;Zhilei Xu;Xuezheng Liu;Haoxiang Lin;Mao Yang;Fan Long;Lintao Zhang;Lidong Zhou
Affiliations:
Columbia University and Microsoft Research Silicon Valley;Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia;Tsinghua University;Microsoft Research Asia and Microsoft Research Silicon Valley;Microsoft Research Asia and Microsoft Research Silicon Valley
Venue:
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Year:
2009

Citing 33
Cited 37

Leases: an efficient fault-tolerant mechanism for distributed file cache consistency

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The Model Checker SPIN

IEEE Transactions on Software Engineering - Special issue on formal methods in software practice
Model checking for programming languages using VeriSoft

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Bandera: extracting finite-state models from Java source code

Proceedings of the 22nd international conference on Software engineering
A static analyzer for finding dynamic programming errors

Software—Practice & Experience
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Automatically validating temporal safety properties of interfaces

SPIN '01 Proceedings of the 8th international SPIN workshop on Model checking of software
ESP: path-sensitive program verification in polynomial time

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Extended static checking for Java

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Software model checking in practice: an industrial case study

Proceedings of the 24th International Conference on Software Engineering
Model Checking Programs

Automated Software Engineering
From Code to Models

ACSD '01 Proceedings of the Second International Conference on Application of Concurrency to System Design
Deterministic Replay of Distributed Java Applications

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Dynamic partial-order reduction for model checking software

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
CMC: a pragmatic approach to model checking real code

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
CUTE: a concolic unit testing engine for C

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
EXE: automatically generating inputs of death

Proceedings of the 13th ACM conference on Computer and communications security
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Mace: language support for building distributed systems

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Checking system rules using system-specific, programmer-written compiler extensions

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Using model checking to find serious file system errors

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
EXPLODE: a lightweight, general system for finding serious storage system errors

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
D3S: debugging deployed distributed systems

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Formal verification of practical MPI programs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
R2: an application-level kernel for record and replay

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Life, death, and the critical transition: finding liveness bugs in systems code

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Sound and Efficient Dynamic Verification of MPI Programs with Probe Non-determinism

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Building reliable large-scale distributed systems: when theory meets practice

ACM SIGACT News
Some resources for teaching concurrency

Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Light64: lightweight hardware support for data race detection during systematic testing of parallel programs

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Predicting and preventing inconsistencies in deployed distributed systems

ACM Transactions on Computer Systems (TOCS)
Fault prediction in distributed systems gone wild

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Finding latent performance bugs in systems implementations

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Language-based replay via data flow cut

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Towards automatically checking thousands of failures with micro-specifications

HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
Synoptic: summarizing system logs with refinement

SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
dBug: systematic evaluation of distributed systems

SSV'10 Proceedings of the 5th international conference on Systems software verification
S2E: a platform for in-vivo multi-path analysis of software systems

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Model checking a networked system without the network

Proceedings of the 8th USENIX conference on Networked systems design and implementation
FATE and DESTINI: a framework for cloud recovery testing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Toward online testing of federated and heterogeneous distributed systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Partial replay of long-running applications

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
dBug: systematic testing of unmodified distributed and multi-threaded systems

Proceedings of the 18th international SPIN conference on Model checking software
Mining temporal invariants from partially ordered logs

SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Practical software model checking via dynamic interface reduction

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
PREFAIL: a programmable tool for multiple-failure injection

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
The S2E Platform: Design, Implementation, and Applications

ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
Evaluating ordering heuristics for dynamic partial-order reduction techniques

FASE'10 Proceedings of the 13th international conference on Fundamental Approaches to Software Engineering
On efficient models for model checking message-passing distributed protocols

FMOODS'10/FORTE'10 Proceedings of the 12th IFIP WG 6.1 international conference and 30th IFIP WG 6.1 international conference on Formal Techniques for Distributed Systems
Modeling asynchronous message passing for c programs

VMCAI'12 Proceedings of the 13th international conference on Verification, Model Checking, and Abstract Interpretation
Supporting domain-specific state space reductions through local partial-order reduction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
A NICE way to test openflow applications

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Detecting problematic message sequences and frequencies in distributed systems

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A SOFT way for openflow switch interoperability testing

Proceedings of the 8th international conference on Emerging networking experiments and technologies
Verifying systems rules using rule-directed symbolic execution

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Failure recovery: when the cure is worse than the disease

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Parrot: a practical runtime for deterministic, stable, and reliable threads

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
On fault resilience of OpenStack

Proceedings of the 4th annual Symposium on Cloud Computing
Detecting cloud provisioning errors using an annotated process model

Proceedings of the 8th Workshop on Middleware for Next Generation Internet Computing
Finding trojan message vulnerabilities in distributed systems

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
HARDFS: hardening HDFS with selective and lightweight versioning

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

MODIST is the first model checker designed for transparently checking unmodified distributed systems running on unmodified operating systems. It achieves this transparency via a novel architecture: a thin interposition layer exposes all actions in a distributed system and a centralized, OS-independent model checking engine explores these actions systematically. We made MODIST practical through three techniques: an execution engine to simulate consistent, deterministic executions and failures; a virtual clock mechanism to avoid false positives and false negatives; and a state exploration framework to incorporate heuristics for efficient error detection. We implemented MODIST on Windows and applied it to three well-tested distributed systems: Berkeley DB, a widely used open source database; MPS, a deployed Paxos implementation; and PACIFICA, a primary-backup replication protocol implementation. MODIST found 35 bugs in total. Most importantly, it found protocol-level bugs (i.e., flaws in the core distributed protocols) in every system checked: 10 in total, including 2 in Berkeley DB, 2 in MPS, and 6 in PACIFICA.