Architectural considerations for a new generation of protocols
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Experiences with a high-speed network adaptor: a software perspective
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
An implementation of the Hamlyn sender-managed interface architecture
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Optimizing network virtualization in Xen
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
An evaluation of network stack parallelization strategies in modern operating systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Achieving 10 Gb/s using safe and transparent network interface virtualization
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Evaluating high performance communication: a power perspective
Proceedings of the 23rd international conference on Supercomputing
Minimizing the Hidden Cost of RDMA
ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Location, location, location!: modeling data proximity in the cloud
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
iWarp protocol kernel space software implementation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Finding a needle in Haystack: facebook's photo storage
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Communications of the ACM
Wimpy nodes with 10GbE: leveraging one-sided operations in soft-RDMA to boost memcached
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
jVerbs: ultra-low latency for data center applications
Proceedings of the 4th annual Symposium on Cloud Computing
On limitations of network acceleration
Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Using one-sided RDMA reads to build a fast, CPU-efficient key-value store
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
Modern cloud computing infrastructures are steadily pushing the performance of their network stacks. At the hardware-level, already some cloud providers have upgraded parts of their network to 10GbE. At the same time there is a continuous effort within the cloud community to improve the network performance inside the virtualization layers. The low-latency/high-throughput properties of those network interfaces are not only opening the cloud for HPC applications, they will also be well received by traditional large scale web applications or data processing frameworks. However, as commodity networks get faster the burden on the end hosts increases. Inefficient memory copying in socket-based networking takes up a significant fraction of the end-to-end latency and also creates serious CPU load on the host machine. Years ago, the supercomputing community has developed RDMA network stacks like Infiniband that offer both low end-to-end latency as well as a low CPU footprint. While adopting RDMA to the commodity cloud environment is difficult (costly, requires special hardware) we argue in this paper that most of the benefits of RDMA can in fact be provided in software. To demonstrate our findings we have implemented and evaluated a prototype of a software-based RDMA stack. Our results, when compared to a socket/TCP approach (with TCP receive copy offload) show significant reduction in end-to-end latencies for messages greater than modest 64kB and reduction of CPU load (w/o TCP receive copy offload) for better efficiency while saturating the 10Gbit/s link.