MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Application level fault tolerance in heterogeneous networks of workstations
Journal of Parallel and Distributed Computing
Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Memory resource management in VMware ESX server
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Libckpt: Transparent Checkpointing under Unix
Libckpt: Transparent Checkpointing under Unix
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A network-failure-tolerant message-passing system for terascale clusters
International Journal of Parallel Programming
Diagnosing performance overheads in the xen virtual machine environment
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
The Architecture of Virtual Machines
Computer
User-level checkpoint and recovery for LAM/MPI
ACM SIGOPS Operating Systems Review
Fault Tolerance in Message Passing Interface Programs
International Journal of High Performance Computing Applications
A comparison of software and hardware techniques for x86 virtualization
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
HDTrans: an open source, low-level dynamic instrumentation system
Proceedings of the 2nd international conference on Virtual execution environments
Measuring CPU overhead for I/O processing in the Xen virtual machine monitor
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Xen and the art of repeated research
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
High performance VMM-bypass I/O in virtual machines
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
High performance and scalable I/O virtualization via self-virtualized devices
Proceedings of the 16th international symposium on High performance distributed computing
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Proactive fault tolerance for HPC with Xen virtualization
Proceedings of the 21st annual international conference on Supercomputing
netWorker - Cloud computing: PC functions move onto the web
NFS-CD: Write-Enabled Cooperative Caching in NFS
IEEE Transactions on Parallel and Distributed Systems
Replication-Based Fault Tolerance for MPI Applications
IEEE Transactions on Parallel and Distributed Systems
IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective
IBM Journal of Research and Development
A scalable asynchronous replication-based strategy for fault tolerant MPI applications
HiPC'07 Proceedings of the 14th international conference on High performance computing
A resiliency model for high performance infrastructure based on logical encapsulation
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A medical image file accessing system with virtualization fault tolerance on cloud
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Comparison of VM deployment methods for HPC education
Proceedings of the 1st Annual conference on Research in information technology
Hi-index | 0.00 |
Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.