Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
The advanced onboard signal processor (AOSP)
Advances in VLSI and Computer Systems
Multiagent systems
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fault Tolerance for Multicomputers: The Application Oriented Paradigm
Fault Tolerance for Multicomputers: The Application Oriented Paradigm
Principles of Program Analysis
Principles of Program Analysis
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Analytic Verification of Flight Software
IEEE Intelligent Systems
Design and Validation of Portable Communication Infrastructure for Fault-Tolerant Cluster Middleware
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Fault-tolerant computing for radiation environments
Fault-tolerant computing for radiation environments
Design for Verification with Dynamic Assertions
SEW '05 Proceedings of the 29th Annual IEEE/NASA on Software Engineering Workshop
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Journal of Field Robotics - Special Issue on Space Robotics, Part III
Chapter I: Notes on structured programming
Structured programming
Toward Application-Aware Security and Reliability
IEEE Security and Privacy
Computer
Introduction to the cell broadband engine architecture
IBM Journal of Research and Development
Spin model checker, the: primer and reference manual
Spin model checker, the: primer and reference manual
Adaptive fault tolerance for many-core based space-borne computing
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Hi-index | 0.00 |
Future missions of deep-space exploration face the challenge of building more capable autonomous spacecraft and planetary rovers. Given the communication latencies and bandwidth limitations for such missions, the need for increased autonomy becomes mandatory, along with the requirement for enhanced on-board computational capabilities while in deep-space or time-critical situations. This will result in dramatic changes in the way missions are conducted and supported by on-board computing systems. Specifically, the traditional approach of relying exclusively on radiation-hardened hardware and modular redundancy will not be able to deliver the required computational power. As a consequence, such systems are expected to include high-capability low-power components based on emerging commercial-off-the-shelf (COTS) multi-core technology. In this paper we describe the design of a generic framework for introspection that supports runtime monitoring and analysis of program execution as well as a feedback-oriented recovery from faults. Our focus is on providing flexible software fault tolerance matched to the requirements and properties of applications by exploiting knowledge that is either contained in an application knowledge base, provided by users, or automatically derived from specifications. A prototype implementation is currently in progress at the Jet Propulsion Laboratory, California Institute of Technology, targeting a cluster of cell broadband engines.