Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
The Alpha 21264 Microprocessor
IEEE Micro
Cache designs for energy efficiency
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Reducing data cache energy consumption via cached load/store queue
Proceedings of the 2003 international symposium on Low power electronics and design
Caching Values in the Load Store Queue
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Scalable Store-Load Forwarding via Store Queue Index Prediction
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
POWER4 system microarchitecture
IBM Journal of Research and Development
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
L1 data cache power reduction using a forwarding predictor
PATMOS'10 Proceedings of the 20th international conference on Integrated circuit and system design: power and timing modeling, optimization and simulation
Hi-index | 0.00 |
This paper presents a study on macro data load, an efficient mechanism to enhance loaded value reuse. A macro data load brings into the processor a maximum-width data value the cache port allows, saves it in an internal structure, and facilitates reuse by later loads. A comprehensive limit study using a generalized memory value reuse table (MVRT) shows the significantly increased reuse opportunities provided by macro data load. We also describe a modified load store queue design as an implementation of the proposed concept. Our quantitative study shows that over 35% of L1 cache accesses in the SPEC2k integer and MiBench programs can be eliminated, resulting in a related energy reduction of 24% and 35% on average, respectively.