Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
On the costs of self-stabilization
Information Processing Letters
A Class of Inherently Fault Tolerant Distributed Programs
IEEE Transactions on Software Engineering
Parallel program design: a foundation
Parallel program design: a foundation
IEEE Transactions on Software Engineering
A bridging model for parallel computation
Communications of the ACM
The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Design of sytems with concurrent error detection using software redundancy
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Self-stabilizing systems in spite of distributed control
Communications of the ACM
IEEE Software
Automatic Generation of Fault-Tolerant CORBA-Services
TOOLS '00 Proceedings of the Technology of Object-Oriented Languages and Systems (TOOLS 34'00)
Negotiating and Enforcing QoS and SLAs in Grid and Cloud Computing
GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Model checking propositional deontic temporal logic via a μ-calculus characterization
SBMF'12 Proceedings of the 15th Brazilian conference on Formal Methods: foundations and applications
Hi-index | 14.98 |
Introduces a nested-predicate scheme for fault tolerance, called Nest. Nest provides a formal comprehensive model for fault-tolerant parallel algorithms and a general methodology for designing reliable applications for multiprocessor systems. The model relies on the formalization of concepts for fault tolerance by means of three nested system predicates and on properties ruling their interrelationships. This rigorous framework facilitates the study of the specific properties that enable an algorithm to tolerate faults. The consequence of that is the outline of systematic design techniques that can be used to add fault tolerance properties to algorithms while preserving their functional characteristics.