Model of error propagation in systems of communicating processes
Science of Computer Programming
Fault Tolerance: Principles and Practice
Fault Tolerance: Principles and Practice
Location of a Faulty Module in a Computing System
IEEE Transactions on Computers
Damage Assessment for Optimal Rollback Recovery
IEEE Transactions on Computers
Determination of an Optimal Retry Time in Multiple-Module Computing Systems
IEEE Transactions on Computers
EPIC: Profiling the Propagation and Effect of Data Errors in Software
IEEE Transactions on Computers
Error propagation analysis for file systems
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 14.99 |
An error propagation model has been developed for multimodule computing systems in which the main parameters are the distribution functions of error propagation times. A digraph model is used to represent a multimodule computing system, and error propagation in the system is modeled by general distributions of error propagation times between all pairs of modules. Two algorithms are developed to compute systematically and efficiently the distributions of error propagation times. Experiments are also conducted to measure the distributions of error propagation times with the fault-tolerant microprocessor (FTMP). Statistical analysis of experimental data shows that the error propagation times in FTMP do not follow a well-known distribution, thus justifying the use of general distributions in the present model.