Model-based failure management for distributed reactive systems

Authors:
Vina Ermagan;Ingolf Krüger;Massimiliano Menarini
Affiliations:
University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA
Venue:
Proceedings of the 13th Monterey conference on Composition of embedded systems: scientific and industrial issues
Year:
2006

Citing 17
Cited 1

Combining angels, demons and miracles in program specifications

Theoretical Computer Science
Safeware: system safety and computers

Safeware: system safety and computers
Component Based Design of Multitolerant Systems

IEEE Transactions on Software Engineering
From MSCs to statecharts

DIPES '98 Proceedings of the IFIP WG10.3/WG10.5 international workshop on Distributed and parallel embedded systems
Verifying Fault Tolerance of Distributed Algorithms Formally - An Example

CSD '98 Proceedings of the 1998 International Conference on Application of Concurrency to System Design
Failure Management for FT-CORBA Applications

WORDS '01 Proceedings of the Sixth International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS'01)
Software Black Box: An Alternative Mechanism for Failure Analysis

ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
Guest Editors' Introduction: Model-Driven Development

IEEE Software
Systematic Development and Exploration of Service-Oriented Software Architectures

WICSA '04 Proceedings of the Fourth Working IEEE/IFIP Conference on Software Architecture
Design of self-managing dependable systems with UML and fault tolerance patterns

WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Model-based run-time monitoring of end-to-end deadlines

Proceedings of the 5th ACM international conference on Embedded software
Correct-ed through Construction: A Model-based Approach to Embedded Systems Reality

ECBS '06 Proceedings of the 13th Annual IEEE International Symposium and Workshop on Engineering of Computer Based Systems
Architecture-driven platform independent deterministic replay for distributed hard real-time systems

Proceedings of the ISSTA 2006 workshop on Role of software architecture for testing and analysis
Towards Model-Based Failure-Management for Automotive Software

SEAS '07 Proceedings of the 4th International Workshop on Software Engineering for Automotive Systems
Capturing overlapping, triggered, and preemptive collaborations using MSCs

FASE'03 Proceedings of the 6th international conference on Fundamental approaches to software engineering
Applying service-oriented development to complex systems: BART case study

Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
Component synthesis from service specifications

SMTT'03 Proceedings of the 2003 international conference on Scenarios: models, Transformations and Tools

Towards a model-driven method for reliable applications: from ideal to realistic transmission semantics

Proceedings of the 2nd International Workshop on Software Engineering for Resilient Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Failure management is key to the development of safety-critical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an understanding of the application domain so as to define what constitutes a failure; (ii) disentangling failure management concepts at design and runtime; and (iii) detecting and mitigating failures at the level of systems-of-systems integration. In this paper, we address (i) and (ii) by developing a failure ontology for logical and deployment architectures, respectively, including a mapping between the two. This ontology is based on the interaction patterns (or services) defining the component interplay in a distributed system. We address (iii) by defining detectors and mitigators at the service/ interaction level - we discuss how to derive detectors for a significant subset of the failure ontology directly from the interaction patterns. We demonstrate the utility of our techniques using a large scale oceanographic sensor/actuator network.