Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems

Authors:
Tayler H. Hetherington;Timothy G. Rogers;Lisa Hsu;Mike O'Connor;Tor M. Aamodt
Affiliations:
Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, CANADA;Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, CANADA;Advanced Micro Devices, Inc. (AMD), USA;Advanced Micro Devices, Inc. (AMD), USA;Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, CANADA
Venue:
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Year:
2012

Citing 0
Cited 12

Power efficiency evaluation of block ciphers on GPU-integrated multicore processor

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
An FPGA memcached appliance

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Cache-Conscious Wavefront Scheduling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
MemC3: compact and concurrent MemCache with dumber caching and smarter hashing

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Thin servers with smart pipes: designing SoC accelerators for memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Proceedings of the VLDB Endowment
Divergence-aware warp scheduling

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Rhythm: harnessing data parallel hardware for server workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent use of graphics processing units (GPUs) in several top supercomputers demonstrate their ability to consistently deliver positive results in high-performance computing (HPC). GPU support for significant amounts of parallelism would seem to make them strong candidates for non-HPC applications as well. Server workloads are inherently parallel; however, at first glance they may not seem suitable to run on GPUs due to their irregular control flow and memory access patterns. In this work, we evaluate the performance of a widely used key-value store middleware application, Memcached, on recent integrated and discrete CPU+GPU heterogeneous hardware and characterize the resulting performance. To gain greater insight, we also evaluate Memcached's performance on a GPU simulator. This work explores the challenges in porting Memcached to OpenCL and provides a detailed analysis into Memcached's behavior on a GPU to better explain the performance results observed on physical hardware. On the integrated CPU+GPU systems, we observe up to 7.5X performance increase compared to the CPU when executing the key-value look-up handler on the GPU.