Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
TCP/IP illustrated (vol. 2): the implementation
TCP/IP illustrated (vol. 2): the implementation
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
Using Multirail Networks in High-Performance Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
ATM and Fast Ethernet Network Interfaces for User-level Communication
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
An Efficient Programmable 10 Gigabit Ethernet Network Interface Card
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Making the Most Out of Direct-Access Network Attached Storage
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Proceedings of the 20th annual international conference on Supercomputing
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
The case for hardware transactional memory in software packet processing
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
Improving communication latency with the write-only architecture
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Ethernet line rates are projected to reach 100 Gbits/s by as soon as 2010. While in principle suitable for high performance clustered and parallel applications, Ethernet requires matching improvements in the system software stack. In this paper we address several sources of CPU and memory system overhead in the I/O path at line rates reaching 80 Gbits/s (bi-directional), using multiple 10 Gbit/s links per system node. Key contributions of our work are the design of a parallel high-performance communication protocol that uses context-independent page-remapping to (a) reduce packet processing overheads; (b) reduce thread management and synchronization overheads; and (c) address affinity issues in NUMA multicore CPUs. Our design result in the full 40 Gbits/s of available one-way Ethernet bandwidth and in 57.6 Gbits/s (72%) of the 80 Gbits/s maximum bidirectional throughput (limited only by the memory system), while leaving ample CPU cycles for application processing.