Error recovery in asynchronous systems
IEEE Transactions on Software Engineering
Handbook of theoretical computer science (vol. B)
Using coordinated atomic actions to design safety-critical systems: a production cell case study
Software—Practice & Experience
Concurrent Exception Handling and Resolution in Distributed Object Systems
IEEE Transactions on Parallel and Distributed Systems
Symbolic Model Checking
Formal Development of Reactive Systems - Case Study Production Cell
Formal Development of Reactive Systems - Case Study Production Cell
Application of Dynamic Reconfiguration in the Design of Fault Tolerant Production Systems
CDS '98 Proceedings of the International Conference on Configurable Distributed Systems
Improving System Reliability with Automatic Fault Tree Generation
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Rigorous Development of a Safety-Critical System Based on Coordinated Atomic Actions
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Coordinated Exception Handling in Distributed Object Systems: From Model to System Implementation
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Checked Transactions in an Asynchronous Message Passing Environment
ISORC '98 Proceedings of the The 1st IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Coordinated Atomic Actions in Modelling Objects Cooperation
ISORC '98 Proceedings of the The 1st IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Journal of Systems Architecture: the EUROMICRO Journal
The Guardian Model and Primitives for Exception Handling in Distributed Systems
IEEE Transactions on Software Engineering
An agent model for fault-tolerant systems
Proceedings of the 2005 ACM symposium on Applied computing
Verification of coordinated exception handling
Proceedings of the 2006 ACM symposium on Applied computing
Improving reliability of cooperative concurrent systems with exception flow analysis
Journal of Systems and Software
Global-to-local approach to rigorously developing distributed system with exception handling
Journal of Computer Science and Technology
Science of Computer Programming
SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
On foundation of engineering context-sensitive applications
Journal of Mobile Multimedia
Achieving dependability in service-oriented systems
Dependable and Historic Computing
Hi-index | 0.00 |
This paper describes our experience using coordinated atomic (CA) actions as a system structuring tool to design and validate a sophisticated and embedded control system for a complex industrial application that has high reliability and safety requirements. Our study is based on an extended production cell model, the specification and simulator for which were defined and developed by FZI (Forschungszentrum Informatik, Germany). This "Fault-Tolerant Production Cell" represents a manufacturing process involving redundant mechanical devices (provided in order to enable continued production in the presence of machine faults). The challenge posed by the model specification is to design a control system that maintains specified safety and liveness properties even in the presence of a large number and variety of device and sensor failures. Based on an analysis of such failures, we provide in this paper details of: 1) a design for a control program that uses CA actions to deal with both safety-related and fault tolerance concerns and 2) the formal verification of this design based on the use of model-checking. We found that CA action structuring facilitated both the design and verification tasks by enabling the various safety problems (involving possible clashes of moving machinery) to be treated independently. Even complex situations involving the concurrent occurrence of any pairs of the many possible mechanical and sensor failures can be handled simply yet appropriately. The formal verification activity was performed in parallel with the design activity and the interaction between them resulted in a combined exercise in "design for validation"; formal verification was very valuable in identifying some very subtle residual bugs in early versions of our design which would have been difficult to detect otherwise.