Model-based failure management for distributed reactive systems

  • Authors:
  • Vina Ermagan;Ingolf Krüger;Massimiliano Menarini

  • Affiliations:
  • University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA

  • Venue:
  • Proceedings of the 13th Monterey conference on Composition of embedded systems: scientific and industrial issues
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Failure management is key to the development of safety-critical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an understanding of the application domain so as to define what constitutes a failure; (ii) disentangling failure management concepts at design and runtime; and (iii) detecting and mitigating failures at the level of systems-of-systems integration. In this paper, we address (i) and (ii) by developing a failure ontology for logical and deployment architectures, respectively, including a mapping between the two. This ontology is based on the interaction patterns (or services) defining the component interplay in a distributed system. We address (iii) by defining detectors and mitigators at the service/ interaction level - we discuss how to derive detectors for a significant subset of the failure ontology directly from the interaction patterns. We demonstrate the utility of our techniques using a large scale oceanographic sensor/actuator network.