Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Authors:
Sayantan Sur;Matthew J. Koop; Lei;Dhabaleswar K. Panda
Affiliations:
Ohio State University;Ohio State University;Ohio State University;Ohio State University
Venue:
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Year:
2007

Citing 0
Cited 3

Performance implications of virtualizing multicore cluster machines

Proceedings of the 2nd workshop on System-level virtualization for high performance computing
Design optimization of a highly parallel InfiniBand host channel adapter

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Architectural support for user-level network interfaces in heavily virtualized systems

WIOV'10 Proceedings of the 2nd conference on I/O virtualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

InfiniBand is an emerging networking technology that is gaining rapid acceptance in the HPC domain. Currently, several systems in the Top500 list use InfiniBand as their primary interconnect, with more being planned for near future. The fundamental architecture of the systems are undergoing a sea-change due to the advent of commodity multi-core computing. Due to the increase in the number of processes in each compute node, the network interface is expected to handle more communication traffic as compared to older dual or quad SMP systems. Thus, the network architecture should provide scalable performance as the number of processing cores increase. ConnectX is the fourth generation InfiniBand adapter from Mellanox Technologies. Its novel architecture enhances the scalability and performance of InfiniBand on multi-core clusters. In this paper, we carry out an in-depth performance analysis of ConnectX architecture comparing it with the third generation InfiniHost III architecture on the Intel Bensley platform with Dual Clovertown processors. Our analysis reveals that the aggregate bandwidth for small and medium sized messages can be increased by a factor of 10 as compared to the third generation InfiniHost III adapters. Similarly, RDMA-Write and RDMA-Read latencies for 1-byte messages can be reduced by a factor of 6 and 3, respectively, even when all cores are communicating simultaneously. Evaluation with communication kernel Halo reveals a performance benefit of a factor of 2 to 5. Finally, the performance of LAMMPS, a molecular dynamics simulator, is improved by 10% for the in.rhodo benchmark.