APE: accelerator processor extensions to optimize data-compute co-location

  • Authors:
  • Ganesh Venkatesh

  • Affiliations:
  • Intel Labs, Hillsboro, Oregon

  • Venue:
  • Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two technological trends we notice in the current day systems is the march towards many core systems and greater focus on power efficiency. The increase in core counts would result in smaller caches-per-compute node and greater reliance on exposing task-level parallelism in applications. However, this would potentially increase the amount of data that moves within and between the different tasks and hence, the related power costs. This will pose a new burden on the already power-constrained current day systems. The situation would only get worse as we go forward because the power consumed by the wires is not scaling down much with each technology generation, but the amount of data that these wires move is increasing per generation. This paper addresses this concern by identifying the memory access patterns that accounts for much of the data movement and designing processor extensions, Apes to support them. These processor extensions are placed closer to the cache structures, rather than the core pipeline, to reduce the data movement and improve compute-data co-location. We show that by doing this we are able to reduce a task's memory accesses by ~2.5×, data movement by 4× and cache miss rate by 40% for a wide range of applications.