A comparative study of high-performance computing on the cloud

Authors:
Aniruddha Marathe;Rachel Harris;David K. Lowenthal;Bronis R. de Supinski;Barry Rountree;Martin Schulz;Xin Yuan
Affiliations:
University of Arizona, Tucson, AZ, USA;University of Arizona, Tucson, AZ, USA;University of Arizona, Tucson, AZ, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Florida State University, Tallahassee, FL, USA
Venue:
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Year:
2013

Citing 16
Cited 0

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems

VTDC '06 Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing
The Real Cost of a CPU Hour

Computer
QBETS: queue bounds estimation from time series

JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Case study for running HPC applications in public clouds

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Open CirrusTMcloud computing testbed: federated data centers for open source systems and services research

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
CloudCmp: comparing public cloud providers

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Exploring the Performance Fluctuations of HPC Workloads on Clouds

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Cost-Effective HPC: The Community or the Cloud?

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

State of the Practice Reports
An overview of CMPI: network performance aware MPI in the cloud

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Performance evaluation of Amazon EC2 for NASA HPC applications

Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Performance analysis of HPC applications in the cloud

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The popularity of Amazon's EC2 cloud platform has increased in recent years. However, many high-performance computing (HPC) users consider dedicated high-performance clusters, typically found in large compute centers such as those in national laboratories, to be far superior to EC2 because of significant communication overhead of the latter. Our view is that this is quite narrow and the proper metrics for comparing high-performance clusters to EC2 is turnaround time and cost. In this paper, we compare the top-of-the-line EC2 cluster to HPC clusters at Lawrence Livermore National Laboratory (LLNL) based on turnaround time and total cost of execution. When measuring turnaround time, we include expected queue wait time on HPC clusters. Our results show that although as expected, standard HPC clusters are superior in raw performance, EC2 clusters may produce better turnaround times. To estimate cost, we developed a pricing model---relative to EC2's node-hour prices---to set node-hour prices for (currently free) LLNL clusters. We observe that the cost-effectiveness of running an application on a cluster depends on raw performance and application scalability.