98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Understanding performance of SMP clusters running MPI programs
Future Generation Computer Systems
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Diagnosing performance overheads in the xen virtual machine environment
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
A case for high performance computing with virtual machines
Proceedings of the 20th annual international conference on Supercomputing
Optimizing network virtualization in Xen
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
High performance VMM-bypass I/O in virtual machines
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
An evaluation of multiple communication interfaces for virtualized SMP clusters
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Information service of virtual machine pool for grid computing
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
On the core affinity and file upload performance of Hadoop
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hi-index | 0.00 |
Clusters of small-scale SMP/CMP nodes are becoming increasingly popular due to their cost-effectiveness. As these nodes are typically capable of supporting a number of network interfaces similar to the number of CPUs, the issue arises how to optimally configure the cluster for optimum communication performance. This paper evaluates a number of configurations on a 4-CPU Opteron cluster with multiple Gigabit Ethernet interfaces. Techniques include channel bonding and using independent communication pathways. With the latter, the use of virtualization via the Xen Virtual Machine Monitor offers the best potential to parallelize all stages of message transmission, for the case when multiple CPUs on a node are communicating simultaneously. Network-level microbenchmarks indicate the best performance is achieved with a configuration where guest virtual machines running on each CPU communicate directly with a dedicated interface, bypassing the virtual machine monitor. Channel bonding also proved to be more effective over multiple communication streams than over single.