Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores

  • Authors:
  • Farrukh Hijaz;Qingchuan Shi;Omer Khan

  • Affiliations:
  • -;-;-

  • Venue:
  • MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Near-threshold voltage operation is widely acknowledged as a poten- tial mechanism to achieve an order of magnitude reduction in energy consumption in future processors. However, processors cannot operate reliably below a minimum voltage, Vccmin, since hardware components may fail. SRAM bitcell failures in memory structures, such as caches, typically determine the Vccmin for a processor. Although the last-level shared caches (LLC) in modern multicores are protected using error correcting codes (ECC), the private caches have been left unprotected due to their performance sensitivity to the latency overhead of the ECC. This limits the operation of the processor at near-threshold voltages.In this paper, we propose mechanisms for near-threshold operation of private caches that do not require ECC support. First, we present a fine-grain mechanism to disable cache lines in private caches, with bitcell failures at the target near-threshold voltage. Second, we propose two mechanisms to better manage the capacity-stressed private caches. (1) We utilize the OS-level data classification of private and shared data and evaluate a data placement mechanism that dynamically relocates the private data blocks to the LLC slice that is physically co-located with the requesting core. (2) We propose an in-hardware yet low-overhead runtime profiling of the locality of each cache line that is classified as private data, and only allow such data to be cached in the private caches if it shows high spatio- temporal locality. These mechanisms allow the private caches to rely on the local LLC slice to cache the low-locality private data efficiently, and enable more space to hold the more frequently used private data (as well as the shared data). We show that combining cache line disabling with efficient cache management of private data performs better (in terms of application completion times) than using a single error correction double error detection (SECDED) based ECC mechanism and/or cache line disabling.