Safety-Critical Systems Built with COTS

  • Authors:
  • Joseph A. Profeta III;Nikos P. Andrianos;Bing Yu;Barry W. Johnson;Todd A. DeLong;David Guaspari;Damir Jamsek

  • Affiliations:
  • -;-;-;-;-;-;-

  • Venue:
  • Computer
  • Year:
  • 1996

Quantified Score

Hi-index 4.10

Visualization

Abstract

The rail transportation industry has always been very safety conscious, given the potential for a catastrophic system failure. And the replacement of traditional relay-based systems with microprocessor-based control systems over the past two decades has made it all the more important to prove software correctness. At the same time, competitive pressure has led to the increased use of COTS (commercial, off-the-shelf) equipment in safety-critical systems, making it imperative that we extend proven safety techniques to COTS-based systems as well. To this end, we have developed the Vital Framework (V_Frame), which is used to develop a safety critical platform from COTS hardware and software. The key technologies in this framework are formal methods, information redundancy, a proprietary data format, and a concurrent checking scheme. Combining these technologies results in a real-time, checkable correctness criterion that is a signature of the application's algorithm structure and is independent of both the hardware and the operating system. Because it is a contradiction for a CPU to check itself and guarantee that it is correctly executing the intended semantics of an application, V_Frame uses a fail-safe, real-time application checker outside the domain of the CPU to ensure the correct (and proper) execution of the application. This is done by placing correctness criteria on the RAC that must be met to allow the outputs to be sent to the field. These correctness criteria are generated from the application rather than the compiled version of the application. This raises the checking process into the information universe as opposed to the physical universe (specific faults in the CPU registers or faults in the firmware running on the CPU). In the event that an error is detected, the system is placed into a known safe state.