Operating system support to detect application hangs

  • Authors:
  • G. Carrozza;M. Cinque;D. Cotroneo;R. Natella

  • Affiliations:
  • Dipartimento di Informatica e Sistemistica, Universit à degli Studi di Napoli Federico II, Naples, Italy;Dipartimento di Informatica e Sistemistica, Universit à degli Studi di Napoli Federico II, Naples, Italy;Dipartimento di Informatica e Sistemistica, Universit à degli Studi di Napoli Federico II, Naples, Italy;Laboratorio CINI ITEM, Complesso Univ. M. S. Angelo, Naples, Italy

  • Venue:
  • VECoS'08 Proceedings of the Second international conference on Verification and Evaluation of Computer and Communication Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

On-line failure detection is an essential means to control and assess the dependability of complex and critical software systems. In such context, effective detection strategies are required, in order to minimize the possibility of catastrophic consequences. This objective is however difficult to achieve in complex systems, especially due to the several sources of non-determinism (e.g., multi-threading and distributed interaction) which may lead to software hangs, i.e., the system is active but no longer capable of delivering its services. The paper proposes a detection approach to uncover application hangs. It exploits multiple indirect data gathered at the operating system level to monitor the system and to trigger alarms if the observed behavior deviates from the expected one. By means of fault injection experiments conducted on a research prototype, it is shown how the combination of several operating system monitors actually leads to an high quality of detection, at an acceptable overhead.