A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
An evaluation of network stack parallelization strategies in modern operating systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Bridging the gap between software and hardware techniques for I/O virtualization
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
Hi-index | 0.00 |
High-speed networking in clusters usually relies on advanced hardware features in the NICs, such as zero-copy. Open-MX is a high-performance message passing stack designed for regular Ethernet hardware without such capabilities. We present the addition of multiqueue support in the Open-MX receive stack so that all incoming packets for the same process are treated on the same core. We then introduce the idea of binding the target end process near its dedicated receive queue. This model leads to a more cache-efficient receive stack for Open-MX. It also proves that very simple and stateless hardware features may have a significant impact on message passing performance over Ethernet. The implementation of this model in a firmware reveals that it may not be as efficient as some manually tuned micro-benchmarks. But our multiqueue receive stack generally performs better than the original single queue stack, especially on large communication patterns where multiple processes are involved and manual binding is difficult.