Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable memory registration for high performance networks using helper threads
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
Clusters based on commodity components continue to be very popular for high-performance computing (HPC).These clusters must be careful to balance both computational as well as I/O requirements of applications. This I/O requirement is generally fulfilled by a high-speed interconnect such as InfiniBand. The balance of computational and I/O performance is often changing, with the latest change being made by the Intel "Nehalem" architecture that can dramatically increase computing power.In this paper we explore how this balance has changed and how different speeds of InfiniBand interconnects including Double Data Rate (DDR) and Quad Data Rate(QDR) InfiniBand HCAs. We explore micro benchmarks, the "communication balance" ratio of intra-node to inter-node performance as well as end application performance. We show up to 10% improvement when using a QDR interconnect for Nehalem systems versus a DDR interconnection the NAS Parallel Benchmarks. We also see up to 25% performance gain with the HPCC randomly ordered ring bandwidth benchmark.