Integrated 3D-stacked server designs for increasing physical density of key-value stores

  • Authors:
  • Anthony Gutierrez;Michael Cieslak;Bharan Giridhar;Ronald G. Dreslinski;Luis Ceze;Trevor Mudge

  • Affiliations:
  • University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA;University of Washington, Seattle, WA, USA;University of Michigan, Ann Arbor, MI, USA

  • Venue:
  • Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Key-value stores, such as Memcached, have been used to scale web services since the beginning of the Web 2.0 era. Data center real estate is expensive, and several industry experts we have spoken to have suggested that a significant portion of their data center space is devoted to key value stores. Despite its wide-spread use, there is little in the way of hardware specialization for increasing the efficiency and density of Memcached; it is currently deployed on commodity servers that contain high-end CPUs designed to extract as much instruction-level parallelism as possible. Out-of-order CPUs, however have been shown to be inefficient when running Memcached. To address Memcached efficiency issues, we propose two architectures using 3D stacking to increase data storage efficiency. Our first 3D architecture, Mercury, consists of stacks of ARM Cortex-A7 cores with 4GB of DRAM, as well as NICs. Our second architecture, Iridium, replaces DRAM with NAND Flash to improve density. We explore, through simulation, the potential efficiency benefits of running Memcached on servers that use 3D-stacking to closely integrate low-power CPUs with NICs and memory. With Mercury we demonstrate that density may be improved by 2.9X, power efficiency by 4.9X, throughput by 10X, and throughput per GB by 3.5X over a state-of-the-art server running optimized Memcached. With Iridium we show that density may be increased by 14X, power efficiency by 2.4X, and throughput by 5.2X, while still meeting latency requirements for a majority of requests.