Reducing L1 caches power by exploiting software semantics

Authors:
Zhen Fang;Li Zhao;Xiaowei Jiang;Shih-lien Lu;Ravi Iyer;Tong Li;Seung Eun Lee
Affiliations:
NVIDIA, Austin, TX, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, OR, USA;Seoul National Univ of Science and Technology, Seoul, South Korea
Venue:
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Year:
2012

Citing 14
Cited 0

Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Understanding and improving operating system effects in control flow prediction

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Reactive-Associative Caches

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning

Proceedings of the 2003 international symposium on Low power electronics and design
Stack Value File: Custom Microarchitecture for the Stack

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
POWER7 multi-core processor design

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

To access a set-associative L1 cache in a high-performance processor, all ways of the selected set are searched and fetched in parallel using physical address bits. Such a cache is oblivious of memory references' software semantics such as stack-heap bifurcation of the memory space, and user-kernel ring levels. This constitutes a waste of energy since e.g., a user-mode instruction fetch will never hit a cache block that contains kernel code. Similarly, a stack access will not hit a cacheline that contains heap data. We propose to exploit software semantics in cache design to avoid unnecessary associative searches, thus reducing dynamic power consumption. Specifically, we utilize virtual memory region properties to optimize the data cache and ring level information to optimize the instruction cache. Our design does not impact performance, and incurs very small hardware cost. Simulations results using SPEC CPU and SPECjapps indicate that the proposed designs help to reduce cache block fetches from DL1 and IL1 by 27% and 57% respectively, resulting in average savings of 15% of DL1 power and more than 30% of IL1 power compared to an aggressively clock-gated baseline.