Adaptive fault-tolerance in complex real-time distributed computer system applications
Computer Communications - Special issue on software aspects of future trends in distributed systems
Dependable Computing for Critical Applications
Dependable Computing for Critical Applications
EFTOS: A Software Framework for More Dependable Embedded HPC Applications
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Stable Memory in Substation Automation: A Case Study
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
The EFTOS Voting Farm: A Software Tool for Fault Masking in Message Passing Parallel Environments
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
Fault tolerant supercomputing: a software approach
Information processing and technology
A survey of linguistic structures for application-level fault tolerance
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
A framework is developed to integrate fault tolerance flexibly and easily into embedded parallel HPC applications. This framework consists of a variety of reusable fault tolerance modules acting at different levels and coping with common requirements. The burden of ad hoc fault tolerance programming is removed from the application developers, while at the same time mediocre fault tolerance support taken at the operating system level is avoided. Integration of this functionality in real embedded applications validates this approach, and provides promising results. In this article we focus on fault tolerance mechanisms for synchronous and asynchronous communication between application threads running on system nodes.