Independent checkpointing in a heterogeneous grid environment

  • Authors:
  • Eugen Feller;John Mehnert-Spahn;Michael Schoettner;Christine Morin

  • Affiliations:
  • INRIA, Centre Rennes - Bretagne Atlantique, Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France;Institut für Informatik, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40255 Düsseldorf, Germany;Institut für Informatik, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40255 Düsseldorf, Germany;INRIA, Centre Rennes - Bretagne Atlantique, Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support various checkpointing protocols and different checkpointer packages (e.g. BLCR, LinuxSSI, OpenVZ, etc.) in a transparent manner through a uniform checkpointer interface. In this paper, we present the integration of a backward error recovery protocol based on independent checkpointing into the XtreemGCP service. The solution we propose is not checkpointer bound and thus can be transparently used on top of any checkpointer package. To evaluate the prototype we run it within a heterogeneous environment composed of single-PC nodes and a Single System Image (SSI) cluster. The experimental results demonstrate the capability of the XtreemGCP service to integrate different checkpointing protocols and independently checkpoint a distributed application within a heterogeneous grid environment. Moreover, the performance evaluation also shows that our solution outperforms the existing coordinated checkpointing protocol in terms of scalability.