Dependability Modeling and Analysis of Distributed Programs

Authors:
N. Lopez-Benitez
Affiliations:
-
Venue:
IEEE Transactions on Software Engineering
Year:
1994

Citing 9
Cited 5

A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems

ACM Transactions on Computer Systems (TOCS)
Distributed program reliability analysis

IEEE Transactions on Software Engineering
A distributed algorithm for constructing minimal spanning trees

IEEE Transactions on Software Engineering
SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods

IEEE Transactions on Computers
Survey of software tools for evaluating reliability, availability, and serviceability

ACM Computing Surveys (CSUR)
Petri Net Theory and the Modeling of Systems

Petri Net Theory and the Modeling of Systems
Advances in Distributed System Reliability

Advances in Distributed System Reliability
Distributed Computing Network Reliability

Distributed Computing Network Reliability
SPNP: Stochastic Petri Net Package

PNPM '89 The Proceedings of the Third International Workshop on Petri Nets and Performance Models

Timing Constraint Petri Nets and Their Application to Schedulability Analysis of Real-Time System Specifications

IEEE Transactions on Software Engineering
Stochastic Petri nets applied to the performance evaluation of static task allocations in heterogeneous computing environments

HCW '97 Proceedings of the 6th Heterogeneous Computing Workshop (HCW '97)
Simulation of Task Graph Systems in Heterogeneous Computing Environments

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Performance analysis of multistage interconnection networks with a new high-level net model

Journal of Systems Architecture: the EUROMICRO Journal
A pattern-based approach for modeling and analyzing error recovery

Architecting dependable systems IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Presents a modeling approach based on stochastic Petri nets to estimate the reliability and availability of programs in a distributed computing system environment. In this environment, successful execution of programs is conditioned on the successful access of related files distributed throughout the system. The use of stochastic Petri nets is demonstrated by extending a basic reliability model to account for repair actions when faults occur. To this end, two possible models are discussed: the global repair model, which assumes a centralized repair team that restores the system to its original status when a failure state is reached, and the local repair model, which assumes that repairs are localized to the node where they occur. The former model is useful in evaluating the availability of programs (or the availability of the hardware support) subject to hardware faults that are repaired globally; therefore, the programs of interest can be interrupted. On the other hand, the latter model can be used to evaluate program reliability in the presence of hardware faults subject to repair, without interrupting the normal operation of the system.