Memcached Design on High Performance RDMA Capable Interconnects

  • Authors:
  • Jithin Jose;Hari Subramoni;Miao Luo;Minjia Zhang;Jian Huang;Md. Wasi-ur-Rahman;Nusrat S. Islam;Xiangyong Ouyang;Hao Wang;Sayantan Sur;Dhabaleswar K. Panda

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-;-

  • Venue:
  • ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Memcached is a key-value distributed memory object caching system. It is used widely in the data-center environment for caching results of database calls, API calls or any other data. Using Memcached, spare memory in data-center servers can be aggregated to speed up lookups of frequently accessed information. The performance of Memcached is directly related to the underlying networking technology, as workloads are often latency sensitive. The existing Memcached implementation is built upon BSD Sockets interface. Sockets offers byte-stream oriented semantics. Therefore, using Sockets, there is a conversion between Memcached's memory-object semantics and Socket's byte-stream semantics, imposing an overhead. This is in addition to any extra memory copies in the Sockets implementation within the OS. Over the past decade, high performance interconnects have employed Remote Direct Memory Access (RDMA) technology to provide excellent performance for the scientific computation domain. In addition to its high raw performance, the memory-based semantics of RDMA fits very well with Memcached's memory-object model. While the Sockets interface can be ported to use RDMA, it is not very efficient when compared with low-level RDMA APIs. In this paper, we describe a novel design of Memcached for RDMA capable networks. Our design extends the existing open-source Memcached software and makes it RDMA capable. We provide a detailed performance comparison of our Memcached design compared to unmodified Memcached using Sockets over RDMA and 10Gigabit Ethernet network with hardware-accelerated TCP/IP. Our performance evaluation reveals that latency of Memcached Get of 4KB size can be brought down to 12 碌s using ConnectX InfiniBand QDR adapters. Latency of the same operation using older generation DDR adapters is about 20碌s. These numbers are about a factor of four better than the performance obtained by using 10GigE with TCP Offload. In addition, these latencies of Get requests over a range of message sizes are better by a factor of five to ten compared to IP over InfiniBand and Sockets Direct Protocol over InfiniBand. Further, throughput of small Get operations can be improved by a factor of six when compared to Sockets over 10 Gigabit Ethernet network. Similar factor of six improvement in throughput is observed over Sockets Direct Protocol using ConnectX QDR adapters. To the best of our knowledge, this is the first such memcached design on high performance RDMA capable interconnects.