Analyzing the effectiveness of fault-management architectures in layered distributed systems

Authors:
Olivia Das;C. Murray Woodside
Affiliations:
Department of Systems and Computer Engineering, Carleton University, 1125 Colonel By Drive, Ottawa, Ont., Canada K1S 5B6;Department of Systems and Computer Engineering, Carleton University, 1125 Colonel By Drive, Ottawa, Ont., Canada K1S 5B6
Venue:
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Year:
2004

Citing 16
Cited 1

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Network management: a practical perspective

Network management: a practical perspective
The simple book (2nd ed.): an introduction to internet management

The simple book (2nd ed.): an introduction to internet management
Queueing networks and Markov chains: modeling and performance evaluation with computer science applications

Queueing networks and Markov chains: modeling and performance evaluation with computer science applications
The implementation of a CORBA object group service

Theory and Practice of Object Systems - Special issue high availability in CORBA
What good are models and what models are good?

Distributed systems (2nd Ed.)
Applied software architecture

Applied software architecture
Evaluating layered distributed software systems with fault-tolerant features

Performance Evaluation
Architecture-based approach to reliability assessment of software systems

Performance Evaluation
The Combinatorics of Network Reliability

The Combinatorics of Network Reliability
Modeling the Coverage and Effectiveness of Fault-Management Architectures in Layered Distributed Systems

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
NCAPS: Application High Availability in UNIX Computer Clusters

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
A Fault Tolerance Framework for CORBA

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Java management extensions for application management

IBM Systems Journal
NT-SwiFT: software implemented fault tolerance on windows NT

WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2

Software architecture-based analysis and testing: a look into achievements and future challenges

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fault management infrastructure in distributed systems includes manager processes and agents with various kinds of interactions for monitoring and surveillance of the status of the application software and hardware. The system architecture now includes these additional components and interactions, and they affect the system availability. This paper describes an architecture model called MAMA (Model for Availability Management Architecture) with an architecture definition language MAMA-dl for the combination of the application and management parts, and its analysis. The analysis extends the Fault Tolerant Layered Queueing Model to account for propagation of knowledge of the system state in the management sub-architecture. The model is demonstrated on a problem of placement of manager tasks in a system.