Availability management of distributed programs and services

  • Authors:
  • Markus Endler

  • Affiliations:
  • Departamento de Ciência da Computação, IME-Universidade de São Paulo, São Paulo, Brazil

  • Venue:
  • CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern distributed applications pose increasing demands for high availability, automatic management, and dynamic configuration of their software systems. This paper presents the architecture of Sampa, a System for Availability Management of Process-based Applications, which aims at fulfilling these requirements. The system has been designed to support the management of fault-tolerant DCE-based distributed programs according to user-provided and application-specific availability specifications. It is supposed to detect and automatically react to faults such as node crashes, network partitions, process crashes, and hang-ups. In this paper, we focus on the design of some of its services - the monitoring, checkpointing, and configuration management facilities - and show how they can be used for managing a generic fault-tolerant service.