Distance-Constrained Scheduling and Its Applications to Real-Time Systems
IEEE Transactions on Computers
Hi-index | 0.00 |
This paper describes the experience of designing and implementing failure detection and reporting in a large distributed real-time system used for air trafic control (ATC). We believe that systematic analysis is needed to guide the failure detection design and track the large number of failures that it deals with. Analysis such as how fast failures have to be detected should be performed carefully to avoid redesigns later. A comprehensive analysis also provides a basis for testing the design subsequently, during which fault injection and extended testing are needed to evaluate and debug the design. Failure detectors should detect specific failures so that appropriate reports and recovery actions can be initiated after detection.