Memcached Design on High Performance RDMA Capable Interconnects

Authors:
Jithin Jose;Hari Subramoni;Miao Luo;Minjia Zhang;Jian Huang;Md. Wasi-ur-Rahman;Nusrat S. Islam;Xiangyong Ouyang;Hao Wang;Sayantan Sur;Dhabaleswar K. Panda
Affiliations:
-;-;-;-;-;-;-;-;-;-;-
Venue:
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Year:
2011

Citing 0
Cited 11

Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Wimpy nodes with 10GbE: leveraging one-sided operations in soft-RDMA to boost memcached

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
High performance RDMA-based design of HDFS over InfiniBand

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Chronos: predictable low latency for data center applications

Proceedings of the Third ACM Symposium on Cloud Computing
An FPGA memcached appliance

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Thin servers with smart pipes: designing SoC accelerators for memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture
jVerbs: ultra-low latency for data center applications

Proceedings of the 4th annual Symposium on Cloud Computing
Using one-sided RDMA reads to build a fast, CPU-efficient key-value store

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Toward a scale-out data-management middleware for low-latency enterprise computing

IBM Journal of Research and Development
FaRM: fast remote memory

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memcached is a key-value distributed memory object caching system. It is used widely in the data-center environment for caching results of database calls, API calls or any other data. Using Memcached, spare memory in data-center servers can be aggregated to speed up lookups of frequently accessed information. The performance of Memcached is directly related to the underlying networking technology, as workloads are often latency sensitive. The existing Memcached implementation is built upon BSD Sockets interface. Sockets offers byte-stream oriented semantics. Therefore, using Sockets, there is a conversion between Memcached's memory-object semantics and Socket's byte-stream semantics, imposing an overhead. This is in addition to any extra memory copies in the Sockets implementation within the OS. Over the past decade, high performance interconnects have employed Remote Direct Memory Access (RDMA) technology to provide excellent performance for the scientific computation domain. In addition to its high raw performance, the memory-based semantics of RDMA fits very well with Memcached's memory-object model. While the Sockets interface can be ported to use RDMA, it is not very efficient when compared with low-level RDMA APIs. In this paper, we describe a novel design of Memcached for RDMA capable networks. Our design extends the existing open-source Memcached software and makes it RDMA capable. We provide a detailed performance comparison of our Memcached design compared to unmodified Memcached using Sockets over RDMA and 10Gigabit Ethernet network with hardware-accelerated TCP/IP. Our performance evaluation reveals that latency of Memcached Get of 4KB size can be brought down to 12 碌s using ConnectX InfiniBand QDR adapters. Latency of the same operation using older generation DDR adapters is about 20碌s. These numbers are about a factor of four better than the performance obtained by using 10GigE with TCP Offload. In addition, these latencies of Get requests over a range of message sizes are better by a factor of five to ten compared to IP over InfiniBand and Sockets Direct Protocol over InfiniBand. Further, throughput of small Get operations can be improved by a factor of six when compared to Sockets over 10 Gigabit Ethernet network. Similar factor of six improvement in throughput is observed over Sockets Direct Protocol using ConnectX QDR adapters. To the best of our knowledge, this is the first such memcached design on high performance RDMA capable interconnects.