An architecture to perform NIC based MPI matching

Authors:
K. Scott Hemmert;Keith D. Underwood;Arun Rodrigues
Affiliations:
Sandia National Laboratories, P.O. Box 5800, MS-1319, Albuquerque, NM 87185-1319, USA;Sandia National Laboratories, P.O. Box 5800, MS-1319, Albuquerque, NM 87185-1319, USA;Sandia National Laboratories, P.O. Box 5800, MS-1319, Albuquerque, NM 87185-1319, USA
Venue:
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Year:
2007

Citing 0
Cited 2

Significantly reducing MPI intercommunication latency and power overhead in both embedded and HPC systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
On the Path to Exascale

International Journal of Distributed Systems and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern supercomputers aggregate thousands of microprocessors through a high performance network. Many of these systems place a processor on the network interface controller (NIC) to handle some portion of the MPI processing. This processing involves traversing a linked list and invoking a matching function for each item. Although this task is critical to the performance of the system, microprocessors perform it extremely poorly. Furthermore, the traditional network processor approaches of multicore and multithreading map poorly to the problem because the list is a shared data structure. While match processing can be implemented directly in hardware, hardware implementations can be extremely inflexible and lead to extremely high risk. This paper presents a novel, programmable architecture for a processor to handle the matching function. The matching engine approaches the performance of a direct hardware implementation while maintaining a high degree of flexibility and programmability. More importantly, it requires a dramatically smaller area than a conventional processor.