Performance evaluation of offloading software modules to cluster network
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
International Journal of Parallel Programming
Using triggered operations to offload collective communication operations
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Many of the modern networks used to interconnect nodes in cluster-based computing systems provide network-interface cards (NICs) that offer programmable processors. Substantial research has been done with the focus of offloading processing from the host to the NIC processor. However, the research has primarily focused on the static offload of specific features to the NIC, mainly to support the optimization of common collective and synchronization-based communications. We describe the design and implementation of a framework based on MP1CH-GM to support the dynamic NIC-based offload of user-defined modules for Myrinet clusters. We evaluate our implementation on a 16-node cluster using a NIC-based version of the common broadcast operation and we find a maximum factor of improvement of 1.2 with respect to total latency as well as a maximum factor of improvement of 2.2 with respect to average CPU utilization under conditions of process skew. In addition, we see that these improvements increase with system size, indicating that our NIC-based framework offers enhanced scalability when compared to a purely host-based approach.