Reducing L1 caches power by exploiting software semantics

  • Authors:
  • Zhen Fang;Li Zhao;Xiaowei Jiang;Shih-lien Lu;Ravi Iyer;Tong Li;Seung Eun Lee

  • Affiliations:
  • NVIDIA, Austin, TX, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Seoul National Univ of Science and Technology, Seoul, South Korea

  • Venue:
  • Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

To access a set-associative L1 cache in a high-performance processor, all ways of the selected set are searched and fetched in parallel using physical address bits. Such a cache is oblivious of memory references' software semantics such as stack-heap bifurcation of the memory space, and user-kernel ring levels. This constitutes a waste of energy since e.g., a user-mode instruction fetch will never hit a cache block that contains kernel code. Similarly, a stack access will not hit a cacheline that contains heap data. We propose to exploit software semantics in cache design to avoid unnecessary associative searches, thus reducing dynamic power consumption. Specifically, we utilize virtual memory region properties to optimize the data cache and ring level information to optimize the instruction cache. Our design does not impact performance, and incurs very small hardware cost. Simulations results using SPEC CPU and SPECjapps indicate that the proposed designs help to reduce cache block fetches from DL1 and IL1 by 27% and 57% respectively, resulting in average savings of 15% of DL1 power and more than 30% of IL1 power compared to an aggressively clock-gated baseline.