Knowledge and common knowledge in a distributed environment
Journal of the ACM (JACM)
Yesterday, my program worked. Today, it does not. Why?
ESEC/FSE-7 Proceedings of the 7th European software engineering conference held jointly with the 7th ACM SIGSOFT international symposium on Foundations of software engineering
Symbolic execution and program testing
Communications of the ACM
Simplifying and Isolating Failure-Inducing Input
IEEE Transactions on Software Engineering
Introduction to Distributed Algorithms
Introduction to Distributed Algorithms
Configuration debugging as search: finding the needle in the haystack
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hi-index | 0.00 |
The distributed systems research community has developed many provably correct algorithms and abstractions that are in wide use. However, practical implementations of distributed systems often contain many bugs, and practitioners spend much of their time troubleshooting these bugs. In this paper we present an algorithm, retrospective causal inference, to ease the burden of troubleshooting. We end by enumerating several open research problems related to the troubleshooting process.