Calculating Cumulative Operational Time Distributions of Repairable Computer Systems
IEEE Transactions on Computers - The MIT Press scientific computation series
Analysis of Performability for Stochastic Models of Fault-Tolerant Systems
IEEE Transactions on Computers
Resilient computing systems: vol. 1
Modelling of centralized concurrency control in a multi-system environment
SIGMETRICS '85 Proceedings of the 1985 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Open, Closed, and Mixed Networks of Queues with Different Classes of Customers
Journal of the ACM (JACM)
Mean-Value Analysis of Closed Multichain Queuing Networks
Journal of the ACM (JACM)
Performability analysis of operation modes of configurable duplex systems
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Hybrid simulation models of computer systems
Communications of the ACM
Computer Performance Modeling Handbook
Computer Performance Modeling Handbook
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Simulation of Computer Communication Systems
Simulation of Computer Communication Systems
Analysis of M/G/2 - Standby Redundant System
Performance '83 Proceedings of the 9th International Symposium on Computer Performance Modelling, Measurement and Evaluation
A combined evaluation of performance and reliability for degradable systems
SIGMETRICS '81 Proceedings of the 1981 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
A Unified Model for the Analysis of Job Completion Time and Performability Measures in Fault-Tolerant Systems
Probability, Statistics, and Queueing Theory with Computer Science Applications
Probability, Statistics, and Queueing Theory with Computer Science Applications
Analysis of Performability for Stochastic Models of Fault-Tolerant Systems
IEEE Transactions on Computers
Optimal reconfiguration strategy for a degradable multimodule computing system
Journal of the ACM (JACM)
Performability Analysis: Measures, an Algorithm, and a Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Knowledge based modeling and analysis of computer architectures
IEA/AIE '88 Proceedings of the 1st international conference on Industrial and engineering applications of artificial intelligence and expert systems - Volume 2
Optimal Dynamic Control of Resources in a Distributed System
IEEE Transactions on Software Engineering
Performability Analysis of Distributed Real-Time Systems
IEEE Transactions on Computers
On Evaluating the Cumulative Performance Distribution of Fault-Tolerant Computer Systems
IEEE Transactions on Computers
Calculating transient distributions of cumulative reward
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A new methodology for calculating distributions of reward accumulated during a finite interval
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Hi-index | 0.01 |
Today's concomitant needs for higher computing power and reliability has increased the relevance of multiple-processor fault-tolerant systems. Multiple functional units improve the raw performance (throughput, response time, etc.) of the system, and, as units fail, the system may continue to function albeit with degraded performance. Such systems and other fault-tolerant systems are not adequately characterized by separate performance and reliability measures. A composite measure for the performance and reliability of a fault-tolerant system observed over a finite mission time is analyzed. A Markov chain model is used for system state-space representation, and transient analysis is performed to obtain closed-form solutions for the density and moments of the composite measure. Only failures that cannot be repaired until the end of the mission are modeled. The time spent in a specific system configuration is assumed to be large enough to permit the use of a hierarchical model and static measures to quantify the performance of the system in individual configurations. For a multiple-processor system, where performance measures are usually associated with and aggregated over many jobs, this is tantamount to assuming that the time to process a job is much smaller than the time between failures. An extension of the results to general acyclic Markov chain models is included.