Low power cache architectures with hybrid approach of filtering unnecessary way accesses

Authors:
Lingjun Fan;Shinan Wang;Yasong Zheng;Weisong Shi;Dongrui Fan
Affiliations:
State Key Laboratory of Computer Architecture, ICT, CAS, Beijing, China;Wayne State University, Detroit;State Key Laboratory of Computer Architecture, ICT, CAS, Beijing, China;Wayne State University, Detroit;State Key Laboratory of Computer Architecture, ICT, CAS, Beijing, China
Venue:
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Year:
2013

Citing 13
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
A way-halting cache for low-energy high-performance systems

Proceedings of the 2004 international symposium on Low power electronics and design
Applying decay to reduce dynamic power in set-associative caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Power and performance aware reconfigurable cache for CMPs

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Dynamic voltage and frequency scaling: the laws of diminishing returns

HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
The future of microprocessors

Communications of the ACM
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Energy efficient united l2 cache design with instruction/data filter scheme

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Power has been a big issue in processor design for several years. As caches account for more and more CPU die area and power, this paper presents using filtering unnecessary way accesses to reduce dynamic power consumption of unified L2 cache shared by instruction and data. Our methods include using Invalid Filter, which could eliminate accesses to cache ways contained invalid blocks, and I/D Filter, which could eliminate accesses to cache ways contained instruction/data access type mismatch blocks, and Tag-2 Filter, which could eliminate accesses to cache ways contained tag lowest 2 bits mismatch blocks. Since the methods reducing the activities happened in cache architecture, dynamical CPU power could be significantly decreased. In the paper, we also propose combining the above methods together(Invalid+I/D+Tag-2 Filter), which is called Way-Filtering Cache, in an attempt to achieve better power saving results. Our evaluations show that, we could obtain 19.6%-47.8% (which is on average 34.3%)improvement on a 64K-4way cache and 19.6%-55.2%(which is on average 39.2%) improvement on a 128k-8way cache comparing to Invalid+I/D Filter, and 6.1%-27.7%(which is on average 16.6%) improvement on a 64K-4way cache and 6.9%-44.4%(which is on average 25.0%) improvement on a 128k-8way cache comparing to Invalid+Tag-2 Filter, respectively. Also, comparing to Tag-Data caches, which is popular used in less-latency-sensitive caches(e.g. unified L2 or Shared Last-LevelCache), our Way-Filtering cache could get 18.3%-29.2%(which is on average 23.1%) improvement on 64K-4way cache, and 27.2% to 50.1%(which is on average 41.1%) improvement on 128K-8way cache.