BLISS: a language for systems programming
Communications of the ACM
HYDRA: the kernel of a multiprocessor operating system
Communications of the ACM
Structured programming
Understanding fault-tolerant distributed systems
Communications of the ACM
A facility for defining and manipulating generalized data structures
ACM Transactions on Database Systems (TODS)
Reliability Issues in Computing System Design
ACM Computing Surveys (CSUR)
Experience Using Multiprocessor Systems—A Status Report
ACM Computing Surveys (CSUR)
Software reliability and design: A survey
DAC '76 Proceedings of the 13th Design Automation Conference
Error resynchronization in producer-consumer systems
SOSP '75 Proceedings of the fifth ACM symposium on Operating systems principles
Improving the reliability of commodity operating systems
ACM Transactions on Computer Systems (TOCS)
Toward type-oriented dynamic vertical migration
ACM SIGMICRO Newsletter
ACM Transactions on Computer Systems (TOCS)
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Software development for reliable software systems
Journal of Systems and Software
Hi-index | 0.02 |
This paper deals with the problem of reliability in a hardware/software system. More specifically it deals with the strategy used to achieve reliability in a particular hardware/software system built by the author and his colleagues at Carnegie-Mellon University. Rather than dealing with the myriad details of the reliability aspects of this systems, the paper focuses on the design philosophy which aims at keeping the system operational even though the underlying hardware may be malfunctioning. This philosophy is essentially an extension of the 'modular' programming methodology, advocated by Parnas and others, to include dynamic error detection and recovery.