Software Engineering: Problems and Perspectives
Computer - IEEE Centennial: the state of computing
Software Engineering Economics
Software Engineering Economics
Design of self-checking software
Proceedings of the international conference on Reliable software
Understanding fault-tolerant distributed systems
Communications of the ACM
An annotated bibliography of dependable distributed computing
ACM SIGOPS Operating Systems Review
Toward a resourceful method of software fault tolerance
ACM-SE 37 Proceedings of the 37th annual Southeast regional conference (CD-ROM)
Stabilizing Pre-Run-Time Schedules With the Help of GraceTime
Real-Time Systems
Modeling software design diversity: a review
ACM Computing Surveys (CSUR)
Low-Cost Error Containment and Recovery for Onboard Guarded Software Upgrading and Beyond
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
ED4I: Error Detection by Diverse Data and Duplicated Instructions
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Containment units: a hierarchically composable architecture for adaptive systems
Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
A Distributed Fault-Tolerant Design for Multiple-Server VOD Systems
Multimedia Tools and Applications
Playback Dispatch and Fault Recovery for a Clustered Video System with Multiple Servers
Multimedia Tools and Applications
Containment units: a hierarchically composable architecture for adaptive systems
ACM SIGSOFT Software Engineering Notes
Journal of Systems Architecture: the EUROMICRO Journal
Real-World Design Diversity: A Case Study on Cost
IEEE Software
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
Design and Verification of Distributed Recovery Blocks with CSP
Formal Methods in System Design
Scheduling Fault-Tolerant Programs on Multiple Processors to Maximize Schedule Reliability
SAFECOMP '99 Proceedings of the 18th International Conference on Computer Computer Safety, Reliability and Security
Synergistic Coordination between Software and Hardware Fault Tolerance Techniques
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Fault tolerant computing in computer design
Journal of Computing Sciences in Colleges
Value-Driven Resource Assignment in Object-Oriented Real-Time Dependable Systems
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Primary-shadow consistency issues in the DRB scheme and the recovery time bound
ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
Design of reliable software via general combination of N-version programming and acceptance testing
ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
Cost of Software Design Diversity: An Empirical Evaluation
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Modeling and Testing a Critical Fault-Tolerant Multi-Process System
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Using Transient/Persistent Errors to Develop Automated Test Oracles for Event-Driven Software
Proceedings of the 19th IEEE international conference on Automated software engineering
Reflections on Industry Trends and Experimental Research in Dependability
IEEE Transactions on Dependable and Secure Computing
Geographically Distributed System for Catastrophic Recovery
LISA '02 Proceedings of the 16th USENIX conference on System administration
A Simulation Approach to Structure-Based Software Reliability Analysis
IEEE Transactions on Software Engineering
Availability Modeling for Reliable Routing Software
DS-RT '05 Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications
An Eclipse-Based Framework for AIS Service Configurations
ISAS '07 Proceedings of the 4th international symposium on Service Availability
Architecting fault tolerance with exception handling: verification and validation
Journal of Computer Science and Technology
Autonomous Agents: When the Mailbox Remains Empty
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Transactions and groups as generic building blocks for software fault tolerance
Ada-Europe'03 Proceedings of the 8th Ada-Europe international conference on Reliable software technologies
Quality analysis of dependable systems: a developer oriented approach
Architecting dependable systems
A taxonomy of software architecture-based reliability efforts
Proceedings of the 2010 ICSE Workshop on Sharing and Reusing Architectural Knowledge
Automatic workarounds for web applications
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Structural analysis of explicit fault-tolerant programs
HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
Dependable computing: concepts, limits, challenges
FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing
Self-checking widgets for interactive cockpits
EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Architecting dependable systems using reflective computing: lessons learnt and some challenges
Architecting dependable systems VII
A layered approach for identifying systematic faults of component-based software systems
Proceedings of the 16th international workshop on Component-oriented programming
An SPL approach for adaptive fault tolerance in SOA
Proceedings of the 15th International Software Product Line Conference, Volume 2
Duplex method for mobile communication systems
MSN'05 Proceedings of the First international conference on Mobile Ad-hoc and Sensor Networks
COTS diversity based intrusion detection and application to web servers
RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
On the use of the SA forum checkpoint and AMF services
ISAS'04 Proceedings of the First international conference on Service Availability
Dependable and Historic Computing
MMB'12/DFT'12 Proceedings of the 16th international GI/ITG conference on Measurement, Modelling, and Evaluation of Computing Systems and Dependability and Fault Tolerance
An Architecture for High Availability Multi-user Systems
Computer Communications
Algorithm for synthesis of real-time systems under reliability constraints
Journal of Computer and Systems Sciences International
Safety demonstration and software development
SAFECOMP'07 Proceedings of the 26th international conference on Computer Safety, Reliability, and Security
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fault-tolerant interactive cockpits for critical applications: overall approach
SERENE'12 Proceedings of the 4th international conference on Software Engineering for Resilient Systems
Addressing dependability for interactive systems: application to interactive cockpits
Proceedings of the 5th ACM SIGCHI symposium on Engineering interactive computing systems
Automatic recovery from runtime failures
Proceedings of the 2013 International Conference on Software Engineering
A framework for self-healing software systems
Proceedings of the 2013 International Conference on Software Engineering
Model-based dynamic distribution of user interfaces of critical interactive systems
Proceedings of the 3rd International Conference on Application and Theory of Automation in Command and Control Systems
The Journal of Supercomputing
International Journal of Human-Computer Studies
Interactive cockpits as critical applications: a model-based and a fault-tolerant approach
International Journal of Critical Computer-Based Systems
Hi-index | 4.12 |
A structured definition of hardware- and software-fault-tolerant architectures is presented. Software-fault-tolerance methods are discussed, resulting in definitions for soft and solid faults. A soft software fault has a negligible likelihood or recurrence and is recoverable, whereas a solid software fault is recurrent under normal operations or cannot be recovered. A set of hardware- and software-fault-tolerant architectures is presented, and three of them are analyzed and evaluated. Architectures tolerating a single fault and architectures tolerating two consecutive faults are discussed separately. A sidebar addresses the cost issues related to software fault tolerance. The approach taken throughout is as general as possible, dealing with specific classes of faults or techniques only when necessary.