Combining angels, demons and miracles in program specifications
Theoretical Computer Science
Safeware: system safety and computers
Safeware: system safety and computers
Component Based Design of Multitolerant Systems
IEEE Transactions on Software Engineering
DIPES '98 Proceedings of the IFIP WG10.3/WG10.5 international workshop on Distributed and parallel embedded systems
Verifying Fault Tolerance of Distributed Algorithms Formally - An Example
CSD '98 Proceedings of the 1998 International Conference on Application of Concurrency to System Design
Failure Management for FT-CORBA Applications
WORDS '01 Proceedings of the Sixth International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS'01)
Software Black Box: An Alternative Mechanism for Failure Analysis
ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
Guest Editors' Introduction: Model-Driven Development
IEEE Software
Systematic Development and Exploration of Service-Oriented Software Architectures
WICSA '04 Proceedings of the Fourth Working IEEE/IFIP Conference on Software Architecture
Design of self-managing dependable systems with UML and fault tolerance patterns
WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Model-based run-time monitoring of end-to-end deadlines
Proceedings of the 5th ACM international conference on Embedded software
Correct-ed through Construction: A Model-based Approach to Embedded Systems Reality
ECBS '06 Proceedings of the 13th Annual IEEE International Symposium and Workshop on Engineering of Computer Based Systems
Architecture-driven platform independent deterministic replay for distributed hard real-time systems
Proceedings of the ISSTA 2006 workshop on Role of software architecture for testing and analysis
Towards Model-Based Failure-Management for Automotive Software
SEAS '07 Proceedings of the 4th International Workshop on Software Engineering for Automotive Systems
Capturing overlapping, triggered, and preemptive collaborations using MSCs
FASE'03 Proceedings of the 6th international conference on Fundamental approaches to software engineering
Applying service-oriented development to complex systems: BART case study
Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
Component synthesis from service specifications
SMTT'03 Proceedings of the 2003 international conference on Scenarios: models, Transformations and Tools
Proceedings of the 2nd International Workshop on Software Engineering for Resilient Systems
Hi-index | 0.00 |
Failure management is key to the development of safety-critical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an understanding of the application domain so as to define what constitutes a failure; (ii) disentangling failure management concepts at design and runtime; and (iii) detecting and mitigating failures at the level of systems-of-systems integration. In this paper, we address (i) and (ii) by developing a failure ontology for logical and deployment architectures, respectively, including a mapping between the two. This ontology is based on the interaction patterns (or services) defining the component interplay in a distributed system. We address (iii) by defining detectors and mitigators at the service/ interaction level - we discuss how to derive detectors for a significant subset of the failure ontology directly from the interaction patterns. We demonstrate the utility of our techniques using a large scale oceanographic sensor/actuator network.