SpotMPI: a framework for auction-based HPC computing using amazon spot instances

  • Authors:
  • Moussa Taifi;Justin Y. Shi;Abdallah Khreishah

  • Affiliations:
  • Temple University, Computer Science Department, Philadelphia, PA;Temple University, Computer Science Department, Philadelphia, PA;Temple University, Computer Science Department, Philadelphia, PA

  • Venue:
  • ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The economy of scale offers cloud computing virtually unlimited cost effective processing potentials. Theoretically, prices under fair market conditions should reflect the most reasonable costs of computations. The fairness is ensured by the mutual agreements between the sellers and the buyers. Resource use efficiency is automatically optimized in the process. While there is no lack of incentives for the cloud provider to offer auction-based computing platform, using these volatile platform for practical computing is a challenge for existing programming paradigms. This paper reports a methodology and a toolkit designed to tame the challenges for MPI applications. Unlike existing MPI fault tolerance tools, we emphasize on dynamically adjusted optimal checkpoint-restart (CPR) intervals. We introduce a formal model, then a HPC application toolkit, named SpotMPI, to facilitate the practical execution of real MPI applications on volatile auction-based cloud platforms. Our models capture the intrinsic dependencies between critical time consuming elements by leveraging instrumented performance parameters and publicly available resource bidding histories. We study algorithms with different computing v.s. communication complexities. Our results show non-trivial insights into the optimal bidding and application scaling strategies.