Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
A partial approach to model checking
Papers presented at the IEEE symposium on Logic in computer science
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Software Engineering - Special issue on formal methods in software practice
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The SLAM project: debugging system software via static analysis
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Towards capturing representative AS-level Internet topologies
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Asynchronous recovery without using vector timestamps
Journal of Parallel and Distributed Computing
Chord: a scalable peer-to-peer lookup protocol for internet applications
IEEE/ACM Transactions on Networking (TON)
Boolean and Cartesian Abstraction for Model Checking C Programs
TACAS 2001 Proceedings of the 7th International Conference on Tools and Algorithms for the Construction and Analysis of Systems
CMC: a pragmatic approach to model checking real code
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
SplitStream: high-bandwidth multicast in cooperative environments
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Automatic detection and repair of errors in data structures
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Modular Verification of Software Components in C
IEEE Transactions on Software Engineering
Dynamic partial-order reduction for model checking software
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
OpenDHT: a public DHT service and its uses
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Vigilante: end-to-end containment of internet worms
Proceedings of the twentieth ACM symposium on Operating systems principles
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
Complementary use of runtime validation and model checking
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Using model checking to find serious file system errors
ACM Transactions on Computer Systems (TOCS)
Using queries for distributed monitoring and forensics
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Maintaining high bandwidth under dynamic network conditions
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Mace: language support for building distributed systems
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Iterative context bounding for systematic testing of multithreaded programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Model checking large network protocol implementations
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
MACEDON: methodology for automatically creating, evaluating, and designing overlay networks
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
FUSE: lightweight guaranteed distributed failure notification
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using random subsets to build scalable network services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
EXPLODE: a lightweight, general system for finding serious storage system errors
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Rx: Treating bugs as allergies—a safe method to survive software failures
ACM Transactions on Computer Systems (TOCS)
Paxos made live: an engineering perspective
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Bouncer: securing software by blocking bad input
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Consensus routing: the internet as a distributed system
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
QVM: an efficient runtime for detecting defects in deployed systems
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
The theory of deadlock avoidance via discrete control
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Opis: reliable distributed systems in OCaml
Proceedings of the 4th international workshop on Types in language design and implementation
MODIST: transparent model checking of unmodified distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Simplifying distributed system development
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Network imprecision: a new consistency metric for scalable monitoring
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Gadara: dynamic deadlock avoidance for multithreaded programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Life, death, and the critical transition: finding liveness bugs in systems code
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
CAV'05 Proceedings of the 17th international conference on Computer Aided Verification
Automated systematic testing of open distributed programs
FASE'06 Proceedings of the 9th international conference on Fundamental Approaches to Software Engineering
A case for end system multicast
IEEE Journal on Selected Areas in Communications
Applying prolog to develop distributed systems
Theory and Practice of Logic Programming
Runtime instrumentation for precise flow-sensitive type analysis
RV'10 Proceedings of the First international conference on Runtime verification
Model checking a networked system without the network
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Hi-index | 0.00 |
We propose a new approach for developing and deploying distributed systems, in which nodes predict distributed consequences of their actions and use this information to detect and avoid errors. Each node continuously runs a state exploration algorithm on a recent consistent snapshot of its neighborhood and predicts possible future violations of specified safety properties. We describe a new state exploration algorithm, consequence prediction, which explores causally related chains of events that lead to property violation. This article describes the design and implementation of this approach, termed CrystalBall. We evaluate CrystalBall on RandTree, BulletPrime, Paxos, and Chord distributed system implementations. We identified new bugs in mature Mace implementations of three systems. Furthermore, we show that if the bug is not corrected during system development, CrystalBall is effective in steering the execution away from inconsistent states at runtime.