Statecharts: A visual formalism for complex systems
Science of Computer Programming
IEEE Transactions on Software Engineering - Special issue on formal methods in software practice
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Model-Checking for Real-Time Systems
FCT '95 Proceedings of the 10th International Symposium on Fundamentals of Computation Theory
Spin model checker, the: primer and reference manual
Spin model checker, the: primer and reference manual
Constraint-guided self-adaptation
IWSAS'01 Proceedings of the 2nd international conference on Self-adaptive software: applications
Metamodeling-rapid design and evolution of domain-specific modeling environments
ECBS'99 Proceedings of the 1999 IEEE conference on Engineering of computer-based systems
Dependability for high-tech systems: an industry-as-laboratory approach
Proceedings of the conference on Design, automation and test in Europe
Model-Based Run-Time Error Detection
Models in Software Engineering
Towards autonomic computing systems
Engineering Applications of Artificial Intelligence
How to learn from the resilience of Human-Machine Systems?
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Autonomy, particularly from a maintenance and fault-management perspective, is an increasingly desirable feature in embedded (and non-embedded) computer systems. The driving factors are several-including increasing pervasiveness of computer systems, cost of failures which could potentially be catastrophic in a wide variety of critical systems, and increasing cost and strain on resources in maintaining systems. A trigger system employed in real-time filtering of particle-collision data is a particularly challenging example of a class of large-scale real-time embedded systems that demand a high degree of fault resilience, due to the large cost of operating the facilities and the potential for loss of irreplaceable data. Traditional redundancy-based approaches are not available due to the limited fault-tolerance budget above the system cost. This paper presents an approach based on model integrated computing that provides a set of tools for the system developer to specify, simulate, and synthesize autonomous fault-mitigative behaviors. A hierarchical, role-based organization of fault managers cleanly delineates the data-processing interactions in the system from the fault-mitigative control interactions. The fault-mitigative behaviors, analogous to autonomous biological systems, are characterized as (1) reflex actions-highly autonomous, localized, and uncoordinated response emanating from a single fault manager at any level of hierarchy, and (2) healing actions-highly coordinated behavior implemented with a sequence of interactions between multiple fault managers. The strength of the approach lies in the specification of these behaviors as coordinated interacting hierarchical concurrent finite-state machines, which makes these behaviors formally analyzable.