Rhythm: harnessing data parallel hardware for server workloads

  • Authors:
  • Sandeep R. Agrawal;Valentin Pistol;Jun Pang;John Tran;David Tarjan;Alvin R. Lebeck

  • Affiliations:
  • Duke University, Durham, USA;Duke University, Durham, USA;Duke University, Durham, USA;NVIDIA Corporation, Santa Clara, USA;NVIDIA Corporation, Santa Clara, USA;Duke University, Durham, USA

  • Venue:
  • Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on the data center as a whole, using off-the-shelf hardware for individual servers. Server capacity is typically increased by adding more machines, which is cheap, though inefficient in the long run in terms of energy and area. Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. We present a framework---called Rhythm---for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluating both software and hardware for future cohort-based servers. Our evaluation of Rhythm on future server platforms shows that it achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9. A Rhythm implementation that generates transposed responses achieves 8x the i7 throughput while processing 2.5x more requests/Joule compared to the A9.