STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
On the performance and use of dense servers
IBM Journal of Research and Development
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
FlashCache: a NAND flash memory file cache for low power web servers
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Improving NAND Flash Based Disk Caches
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Characterizing flash memory: anomalies, observations, and applications
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Using non-volatile memory to save energy in servers
Proceedings of the Conference on Design, Automation and Test in Europe
ACM SIGARCH Computer Architecture News
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
A limits study of benefits from nanostore-based future data-centric system architectures
Proceedings of the 9th conference on Computing Frontiers
Workload analysis of a large-scale key-value store
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
Thermal characterization of cloud workloads on a power-efficient server-on-chip
ICCD '12 Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
Thin servers with smart pipes: designing SoC accelerators for memcached
Proceedings of the 40th Annual International Symposium on Computer Architecture
Proceedings of the 40th Annual International Symposium on Computer Architecture
Exploring DRAM organizations for energy-efficient and resilient exascale memories
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Key-value stores, such as Memcached, have been used to scale web services since the beginning of the Web 2.0 era. Data center real estate is expensive, and several industry experts we have spoken to have suggested that a significant portion of their data center space is devoted to key value stores. Despite its wide-spread use, there is little in the way of hardware specialization for increasing the efficiency and density of Memcached; it is currently deployed on commodity servers that contain high-end CPUs designed to extract as much instruction-level parallelism as possible. Out-of-order CPUs, however have been shown to be inefficient when running Memcached. To address Memcached efficiency issues, we propose two architectures using 3D stacking to increase data storage efficiency. Our first 3D architecture, Mercury, consists of stacks of ARM Cortex-A7 cores with 4GB of DRAM, as well as NICs. Our second architecture, Iridium, replaces DRAM with NAND Flash to improve density. We explore, through simulation, the potential efficiency benefits of running Memcached on servers that use 3D-stacking to closely integrate low-power CPUs with NICs and memory. With Mercury we demonstrate that density may be improved by 2.9X, power efficiency by 4.9X, throughput by 10X, and throughput per GB by 3.5X over a state-of-the-art server running optimized Memcached. With Iridium we show that density may be increased by 14X, power efficiency by 2.4X, and throughput by 5.2X, while still meeting latency requirements for a majority of requests.