Measurement and modeling of computer reliability as affected by system activity
ACM Transactions on Computer Systems (TOCS)
Performance Modeling Based on Real Data: A Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Survey of software tools for evaluating reliability, availability, and serviceability
ACM Computing Surveys (CSUR)
Journal of the ACM (JACM)
A Unified Framework for Simulating Markovian Models of Highly Dependable Systems
IEEE Transactions on Computers
Analysis and Modeling of Correlated Failures in Multicomputer Systems
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Dependability Measurement and Modeling of a Multicomputer System
IEEE Transactions on Computers
Measurement-based Analysis of Networked System Availability
Performance Evaluation: Origins and Directions
FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing
Hi-index | 0.00 |
Most existing dependability modeling and evaluation tools are designed for building and solving commonly used models with emphasis on solution techniques, not for identifying realistic models from measurements. In this paper, a measurement-based dependability analysis package, MEASURE+, is introduced. Given measured data from real systems in a specified format MEASURE+ can generate appropriate dependability models and measures including Markov and semi-Markov models, k-out-of-n availability models, failure distribution and hazard functions, and correlation parameters. These models and measures obtained from data are valuable for understanding actual error/failure characteristics, identifying system bottlenecks, evaluating dependability for real systems, and verifying assumptions made in analytical models. The paper illustrates MEASURE+ by applying it to the data from a VAXcluster multicomputer system. Models of field failure behavior identified by MEASURE+ indicate that both traditional models assuming failure independence and those few taking correlation into account are not representative of the actual occurrence process of correlated failures.