Fault-tolerance design of the IBM Enterprise System/9000 Type 9021 processors
IBM Journal of Research and Development
Fault-tolerance in the advanced automation system
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Stabilizing Pre-Run-Time Schedules With the Help of GraceTime
Real-Time Systems
Embryonics: A Bio-Inspired Cellular Architecture with Fault-Tolerant Properties
Genetic Programming and Evolvable Machines
Advances in exception handling techniques
Self-Repairing Multicellular Hardware: A Reliability Analysis
ECAL '99 Proceedings of the 5th European Conference on Advances in Artificial Life
Novel Approaches in Dependable Computing
EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
An Immune System Paradigm for the Design of Fault Tolerant Systems
EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
Immunotronics: Hardware Fault Tolerance Inspired by the Immune System
ICES '00 Proceedings of the Third International Conference on Evolvable Systems: From Biology to Hardware
ICES '00 Proceedings of the Third International Conference on Evolvable Systems: From Biology to Hardware
Untidy Evolution: Evolving Messy Gates for Fault Tolerance
ICES '01 Proceedings of the 4th International Conference on Evolvable Systems: From Biology to Hardware
Advances in Exception Handling Techniques (the book grow out of a ECOOP 2000 workshop)
What Designers of Bus and Network Architectures Should Know about Hypercubes
IEEE Transactions on Computers
On-Board Maintenance for Long-Life Systems
ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
Describing Evolving Dependable Systems using Co-operative Software Architectures
ICSM '01 Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)
Reflections on Industry Trends and Experimental Research in Dependability
IEEE Transactions on Dependable and Secure Computing
SEU tolerant device, circuit and processor design
Proceedings of the 42nd annual Design Automation Conference
Stigmergic approaches applied to flexible fault-tolerant digital VLSI architectures
Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Evaluating recovery aware components for grid reliability
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Autonomic fault mitigation in embedded systems
Engineering Applications of Artificial Intelligence
Reliability and path length analysis of irregular fault tolerant multistage interconnection network
ACM SIGARCH Computer Architecture News
WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
Achieving software robustness via large-scale multiagent systems
Software engineering for large-scale multi-agent systems
Object-oriented architecture for digital pulse shape acquisition from AZ/4π detectors: a case study
RTC'05 Proceedings of the 14th IEEE-NPSS conference on Real time
Formal development of reactive fault tolerant systems
RISE'05 Proceedings of the Second international conference on Rapid Integration of Software Engineering Techniques
Immunising automated teller machines
ICARIS'05 Proceedings of the 4th international conference on Artificial Immune Systems
The conflict between self-* capabilities and predictability
Self-star Properties in Complex Information Systems
Hi-index | 4.10 |
The mid-century "space race" was a major impetus for the development of fault-tolerant computing. Over the succeeding 25 years researchers expanded the concept of fault tolerance and refined the techniques for achieving it. Nevertheless, the bottom-up approach, entailing an infrastructure of autonomously fault-tolerant subsystems integrated with global fault tolerance functions, is less common today than the top-down approach, which relies on off-the-shelf (OTS) subsystems and a global monitoring function. A design paradigm for the systematic treatment of fault tolerance involves four steps: specification, implementation, evaluation, and modification. The paradigm offers a way to minimize the probability of oversights, mistakes, and inconsistencies that may occur during the implementation of fault tolerance. In spite of the long-range merits of this bottom-up approach, time and cost constraints often lead developers to use OTS subsystems when designing systems that are expected to be highly dependable. Even the Pentium Pro, which appears to have the most complete set of fault tolerance functions among contemporary microprocessors, has major drawbacks. Moreover, systems built from OTS subsystems are difficult to retrofit for fault tolerance. Without hardware support for fault tolerance, the only solution is to build a software monitor subsystem that tries to check all subsystems for indications of failure. But the monitor itself is unprotected because it resides and executes on an OTS processor. Researchers would do well to consider the human immune system as a model for systems in which fault tolerance is an integral attribute of every hardware element.