Heterogeneous process state capture and recovery through Process Introspection

  • Authors:
  • Adam Ferrari;Steve J. Chapin;Andrew Grimshaw

  • Affiliations:
  • University of Virginia, Charlottesville, VA 22904, USA;Syracuse University, Syracuse, NY 13244, USA;University of Virginia, Charlottesville, VA 22904, USA

  • Venue:
  • Cluster Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ability to capture the state of a process and later recover that state in the form of an equivalent running process is the basis for a number of important features in parallel and distributed systems. Adaptive load sharing and fault tolerance are well‐known examples. Traditional state capture mechanisms have employed an external agent (such as the operating system kernel) to examine and capture process state. However, the increasing prevalence of heterogeneous cluster and “metacomputing” systems as high‐performance computing platforms has prompted investigation of process‐internal state capture mechanisms. Perhaps the greatest advantage of the process‐internal approach is the ability to support cross‐platform state capture and recovery, an important feature in heterogeneous environments. Among the perceived disadvantages of existing process‐internal mechanisms are poor performance in multiple respects, and difficulty of use in terms of programmer effort. In this paper we describe a new process‐internal state capture and recovery mechanism: Process Introspection. Experiences with this system indicate that the perceived disadvantages associated with process‐internal mechanisms can be largely overcome, making this approach to state capture an appropriate one for cluster and metacomputing environments.