Error recovery in asynchronous systems
IEEE Transactions on Software Engineering
Statecharts: A visual formalism for complex systems
Science of Computer Programming
Using Z: specification, refinement, and proof
Using Z: specification, refinement, and proof
The B-book: assigning programs to meanings
The B-book: assigning programs to meanings
A distributed object-oriented framework for dependable multiparty interactions
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Using coordinated atomic actions to design safety-critical systems: a production cell case study
Software—Practice & Experience
Alcoa: the alloy constraint analyzer
Proceedings of the 22nd international conference on Software engineering
Exception handling: issues and a proposed notation
Communications of the ACM
Alloy: a lightweight object modelling notation
ACM Transactions on Software Engineering and Methodology (TOSEM)
ACM SIGOPS Operating Systems Review
Rigorous Development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
The J2EE tutorial
Journal of Systems and Software
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Fault Tolerance: Principles and Practice
Fault Tolerance: Principles and Practice
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
The 4+1 View Model of Architecture
IEEE Software
A Field Guide to Boxology: Preliminary Classification of Architectural Styles for Software Systems
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Rigorous Development of a Safety-Critical System Based on Coordinated Atomic Actions
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Structuring Integrated Web Applications for Fault Tolerance
ISADS '03 Proceedings of the The Sixth International Symposium on Autonomous Decentralized Systems (ISADS'03)
Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Static analysis to support the evolution of exception structure in object-oriented systems
ACM Transactions on Software Engineering and Methodology (TOSEM)
Finding and preventing run-time error handling mistakes
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Structured Stochastic Modeling of Fault-Tolerant Systems
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Exception handling in the development of dependable component-based systems
Software—Practice & Experience - Research Articles
Verification of coordinated exception handling
Proceedings of the 2006 ACM symposium on Applied computing
Exceptions and aspects: the devil is in the details
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
CAA-DRIP: a framework for implementing Coordinated Atomic Actions
ISSRE '06 Proceedings of the 17th International Symposium on Software Reliability Engineering
Exception-Chain Analysis: Revealing Exception Handling Architecture in Java Server Applications
ICSE '07 Proceedings of the 29th international conference on Software Engineering
EJFlow: taming exceptional control flows in aspect-oriented programming
Proceedings of the 7th international conference on Aspect-oriented software development
Hi-index | 0.00 |
Developers of fault-tolerant distributed systems need to guarantee that fault tolerance mechanisms they build are in themselves reliable. Otherwise, these mechanisms might in the end negatively affect overall system dependability, thus defeating the purpose of introducing fault tolerance into the system. To achieve the desired levels of reliability, mechanisms for detecting and handling errors should be developed rigorously or formally. We present an approach to modeling and verifying fault-tolerant distributed systems that use exception handling as the main fault tolerance mechanism. In the proposed approach, a formal model is employed to specify the structure of a system in terms of cooperating participants that handle exceptions in a coordinated manner, and coordinated atomic actions serve as representatives of mechanisms for exception handling in concurrent systems. We validate the approach through two case studies: (i) a system responsible for managing a production cell, and (ii) a medical control system. In both systems, the proposed approach has helped us to uncover design faults in the form of implicit assumptions and omissions in the original specifications.