Power efficiency evaluation of block ciphers on GPU-integrated multicore processor
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
MemC3: compact and concurrent MemCache with dumber caching and smarter hashing
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Thin servers with smart pipes: designing SoC accelerators for memcached
Proceedings of the 40th Annual International Symposium on Computer Architecture
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
Proceedings of the VLDB Endowment
Divergence-aware warp scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Rhythm: harnessing data parallel hardware for server workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
The recent use of graphics processing units (GPUs) in several top supercomputers demonstrate their ability to consistently deliver positive results in high-performance computing (HPC). GPU support for significant amounts of parallelism would seem to make them strong candidates for non-HPC applications as well. Server workloads are inherently parallel; however, at first glance they may not seem suitable to run on GPUs due to their irregular control flow and memory access patterns. In this work, we evaluate the performance of a widely used key-value store middleware application, Memcached, on recent integrated and discrete CPU+GPU heterogeneous hardware and characterize the resulting performance. To gain greater insight, we also evaluate Memcached's performance on a GPU simulator. This work explores the challenges in porting Memcached to OpenCL and provides a detailed analysis into Memcached's behavior on a GPU to better explain the performance results observed on physical hardware. On the integrated CPU+GPU systems, we observe up to 7.5X performance increase compared to the CPU when executing the key-value look-up handler on the GPU.