Reliability modeling techniques for self-repairing computer systems
ACM '69 Proceedings of the 1969 24th national conference
Fault-Tolerant Software for Real-Time Applications
ACM Computing Surveys (CSUR)
Software reliability: The role of programmed exception handling
Proceedings of an ACM conference on Language design for reliable software
Time-bounded cooperative recovery with the distributed real-time conversation scheme
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
An approach to efficient, fault tolerant programming
ACM SIGPLAN Notices
Review and analysis of synthetic diversity for breaking monocultures
Proceedings of the 2004 ACM workshop on Rapid malcode
Toward Integration Of Major Design Techniques For Real-Time Fault-Tolerant Computer Systems
Journal of Integrated Design & Process Science
IEEE Transactions on Computers
IEEE Transactions on Computers
Design diversity: an approach to fault tolerance of design faults
AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
Approaches to computer reliability: then and now
AFIPS '76 Proceedings of the June 7-10, 1976, national computer conference and exposition
Interconnect agnostic checkpoint/restart in open MPI
Proceedings of the 18th ACM international symposium on High performance distributed computing
Hi-index | 0.01 |
Two complementary methods whìch are employed in order to assure relìable computing are fault-intolerance and fault-tolerance. Fault-intolerance depends on the elìmìnatìon of the causes of unreliability prior to the start of the computing process while fault-tolerance employs protective redunuancy during the computing process in order to detect and to correct unreliable functìoning. A balanced allocation of reliability resources between the two methods appears to offer the best practical solution. The paper reviews current fault-tolerance practices in system architecture and discusses their relevance to software systems.