Data caches for superscalar processors
ICS '97 Proceedings of the 11th international conference on Supercomputing
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
IEEE Transactions on Computers
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Sentry tag: an efficient filter scheme for low power cache
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Two new techniques integrated for energy-efficient TLB design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Applying decay to reduce dynamic power in set-associative caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Communications of the ACM
SpecTLB: a mechanism for speculative address translation
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.00 |
This paper addresses the dynamic energy consumption in L1 data cache interfaces of out-of-order superscalar processors. The proposed Multiple Access Low Energy Cache (MALEC) is based on the observation that consecutive memory references tend to access the same page. It exhibits a performance level similar to state of the art caches, but consumes approximately 48% less energy. This is achieved by deliberately restricting accesses to only 1 page per cycle, allowing the utilization of single-ported TLBs and cache banks, and simplified lookup structures of Store and Merge Buffers. To mitigate performance penalties it shares memory address translation results between multiple memory references, and shares data among loads to the same cache line. In addition, it uses a Page-Based Way Determination scheme that holds way information of recently accessed cache lines in small storage structures called way tables that are closely coupled to TLB lookups and are able to simultaneously service all accesses to a particular page. Moreover, it removes the need for redundant tag-array accesses, usually required to confirm way predictions. For the analyzed workloads, MALEC achieves average energy savings of 48% in the L1 data memory subsystem over a high performance cache interface that supports up to 2 loads and 1 store in parallel. Comparing MALEC and the high performance interface against a low power configuration limited to only 1 load or 1 store per cycle reveals 14% and 15% performance gain requiring 22% less and 48% more energy, respectively. Furthermore, Page-Based Way Determination exhibits coverage of 94%, which is a 16% improvement over the originally proposed line-based way determination.