Performance Modeling Based on Real Data: A Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Availability and reliability modeling for computer systems
Advances in computers
VAXcluster: a closely-coupled distributed system
ACM Transactions on Computer Systems (TOCS)
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Dependability Measurement and Modeling of a Multicomputer System
IEEE Transactions on Computers
MEASURE+: a measurement-based dependability analysis package
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Measurement-based Analysis of Networked System Availability
Performance Evaluation: Origins and Directions
Evaluating quorum systems over the Internet
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Reflections on Industry Trends and Experimental Research in Dependability
IEEE Transactions on Dependable and Secure Computing
Robustness Testing of Java Server Applications
IEEE Transactions on Software Engineering
Glacier: highly durable, decentralized storage despite massive correlated failures
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Using fault injection and modeling to evaluate the performability of cluster-based services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Online event correlations analysis in system logs of large-scale cluster systems
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing
A decentralized approach for mining event correlations in distributed system monitoring
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Based on the measurements from two DEC VAX-cluster multicomputer systems, the issue of correlated failures is addressed. In particular, the characteristics of correlated failures, their impact and their modelling on dependability, are discussed. It is found from the data that most correlated failures are related to errors in shared resources and propagate from one machine to another. Comparisons between measurement-based models and analytical models that assume failure independence show that the impact of correlated failures on dependability is significant. Two validated models. the c-dependent model and the p-dependent model, are developed to evaluate the dependability of systems with correlated failures.