Checkpointing Facility on a Metasystem

  • Authors:
  • Yudith Cardinale;Emilio Hernández

  • Affiliations:
  • -;-

  • Venue:
  • Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

A metasystem allows seamless access to a collection of distributed computational resources. Checkpointing is an important service in high throughput computing, especially for process migration and recovery after system crash. This article describes the experiences on incorporating checkpointing and recovery facilities in a Java-based metasystem. Our case study is suma, a metasystem for execution of Java bytecode, both sequential and parallel. This paper also shows preliminary results on checkpointing and recovery overhead for single-node applications.