A first order approximation to the optimum checkpoint interval
Communications of the ACM
A higher order estimate of the optimum checkpoint interval for restart dumps
Future Generation Computer Systems
Dynamic resource allocation for spot markets in clouds
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
Resource Planning for Parallel Processing in the Cloud
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Sustainable GPU Computing at Scale
CSE '11 Proceedings of the 2011 14th IEEE International Conference on Computational Science and Engineering
Hi-index | 0.00 |
Cloud computing benefits extensively from economies of scale to provide cost effective computing. Recently, reliability has been introduced as a potential tradeoff point for delivering compute resources while decreasing further the price of cloud resources. The usage of fair market conditions create an environment where sellers and buyers of compute resources can benefit from trading their resources. The resource use efficiency can potentially be achieved as a result. While there are many advantages to the usage of auction-based infrastructure there are currently no practical computing platforms that can harness such volatile environments effectively. This research work reports a methodology and a toolkit designed to address the challenges of using volatile cloud-based auctioned resources for MPI applications. Specifically we emphasize the use of dynamically adjusted optimal checkpoint-restart (CPR) intervals. We discuss an initial analytical model for dealing with price histories and selecting optimal checkpoint intervals. Also we describe the SpotMPI toolkit that can be used to achieve practical execution of MPI application on volatile auction-based cloud platforms. The result of this exploration is the synthesis of intrinsic dependencies that exist in MPI-based parallel applications with the publicly available price histories of HPC cloud resources on the Amazon cloud. We study algorithms with different computing v.s. communication complexities. Our results show counter-intuitive insights into the optimal bidding and application scaling strategies.