Evaluating energy savings for checkpoint/restart

  • Authors:
  • Bryan Mills;Ryan E. Grant;Kurt B. Ferreira;Rolf Riesen

  • Affiliations:
  • University of Pittsburgh;Sandia National Laboratories;Sandia National Laboratories;IBM Research - Ireland

  • Venue:
  • E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The U. S. Department of Energy has identified resilience and energy consumption as key challenges for future extreme-scale systems. All checkpoint/restart methods require I/O to local or remote storage. Efforts are under way to minimize the amount of data movement and increase scalability. Nevertheless, the energy consumed by fault resilience methods will increase with system size. It is therefore important to understand the performance overhead in conjunction with the energy consumption of each fault resilience method. In this paper we explore throttling CPU power consumption during I/O intensive checkpoint operations of real applications. We find that 10% total energy savings are possible with little impact on application time to solution.