On the minimal synchronism needed for distributed consensus
Journal of the ACM (JACM)
IEEE Transactions on Software Engineering - Special issue on computer security and privacy
Asynchronous byzantine agreement protocols
Information and Computation
Shifting gears: changing algorithms on the fly to expedite Byzantine agreement
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Knowledge and common knowledge in a distributed environment
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
ACM Transactions on Computer Systems (TOCS)
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fault Detection for Byzantine Quorum Systems
IEEE Transactions on Parallel and Distributed Systems
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems (TOCS)
Secure untrusted data repository (SUNDR)
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
PeerReview: practical accountability for distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Towards privacy-preserving fault detection
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Hi-index | 0.00 |
One of the most important challenges in distributed computing is ensuring that services are correct and available despite faults. Recently it has been argued that fault detection can be factored out from computation, and that a generic fault detection service can be a useful abstraction for building distributed systems. However, while fault detection has been extensively studied for crash faults, little is known about detecting more general kinds of faults. This paper explores the power and the inherent costs of generic fault detection in a distributed system. We propose a formal framework that allows us to partition the set of all faults that can possibly occur in a distributed computation into several fault classes . Then we formulate the fault detection problem for a given fault class, and we show that this problem can be solved for only two specific fault classes, namely omission faults and commission faults . Finally, we derive tight lower bounds on the cost of solving the problem for these two classes in asynchronous message-passing systems.