Performance evaluation of an application-level checkpointing solution on grids

  • Authors:
  • Gabriel Rodríguez;Xoán C. Pardo;María J. Martín;Patricia González

  • Affiliations:
  • Computer Architecture Group, University of A Coruña, Spain;Computer Architecture Group, University of A Coruña, Spain;Computer Architecture Group, University of A Coruña, Spain;Computer Architecture Group, University of A Coruña, Spain

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years there has been a significant effort to develop middleware that facilitates the execution of applications on Grid infrastructures. However, support for fault-tolerant execution continues to be scarce. The CPPC-G framework is a service-based architecture designed to provide efficient fault-tolerant mechanisms for the execution of sequential and parallel applications on grids. Applications to be managed by CPPC-G are expected to be preprocessed with CPPC (ComPiler for Portable Checkpointing), a tool for automatically inserting portable checkpoint instrumentation into the code of parallel applications. Built on top of existing Globus services, CPPC-G services are in charge of submitting and monitoring CPPC applications, managing generated checkpoint files, detecting failures and automatically restarting failed executions. In this paper the feasibility of this approach is assessed by measuring the performance of CPPC-G, quantitatively addressing its impact on application performance. Results show that the increase in overall throughput and availability comes with minor performance degradation.