Recommendations for Virtualization Technologies in High Performance Computing

  • Authors:
  • Nathan Regola;Jean-Christophe Ducom

  • Affiliations:
  • -;-

  • Venue:
  • CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The benefits of virtualization are typically considered to be server consolidation, (leading to the reduction of power and cooling costs) increased availability, isolation, ease of operating system deployment and simplified disaster recovery. High Performance Computing (HPC) environments pose one main challenge for virtualization: the need to maximize throughput with minimal loss of CPU and I/O efficiency. However, virtualization is usually evaluated in terms of enterprise workloads and assumes that servers are underutilized and can be consolidated. In this paper we evaluate the performance of several virtual machine technologies in the context of HPC. A fundamental requirement of current high performance workloads is that both CPU and I/O must be highly efficient for tasks such as MPI jobs. This work benchmarks two virtual machine monitors, Open VZ and KVM, specifically focusing on I/O throughput since CPU efficiency has been extensively studied [1]. Open VZ offers near native I/O performance. Amazon’s EC2 “ClusterCompute Node” product is also considered for comparative purposes and performs quite well. The EC2 “Cluster ComputeNode” product utilizes the Xen hyper visor in hvm mode and 10Gbit/s Ethernet for high throughput communication. Therefore, we also briefly studied Xen on our hardware platform (in hvmmode) to determine if there are still areas of improvement in KVM that allow EC2 to outperform KVM (with InfiniBand host channel adapters operating at 20 Gbit/s) in MPI benchmarks. We conclude that KVM’s I/O performance is sub optimal, potentially due to memory management problems in the hyper visor. Amazon’sEC2 service is promising, although further investigation is necessary to understand the effects of network based storage on I/O throughput in compute nodes. Amazon’s offering may be attractive for users searching for “InfiniBand-like” performance without the upfront investment required to build an InfiniBand cluster or users wishing to dynamically expand their cluster during periods of high demand.