Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic zero compression for cache energy reduction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy-efficient load and store reuse
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Silent Stores and Store Value Locality
IEEE Transactions on Computers
Frequent value locality and its applications
ACM Transactions on Embedded Computing Systems (TECS)
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Energy efficient frequent value data cache design
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hybridizing and Coalescing Load Value Predictors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Reducing data cache energy consumption via cached load/store queue
Proceedings of the 2003 international symposium on Low power electronics and design
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
A Robust Main-Memory Compression Scheme
Proceedings of the 32nd annual international symposium on Computer Architecture
Scalable Load and Store Processing in Latency Tolerant Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Scalable Store-Load Forwarding via Store Queue Index Prediction
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Reducing cache traffic and energy with macro data load
Proceedings of the 2006 international symposium on Low power electronics and design
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Cancellation of loads that return zero using zero-value caches
Proceedings of the 23rd international conference on Supercomputing
Hi-index | 0.00 |
The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy calls for more efficient approaches. Broadly speaking, cache-hierarchy efficiency can be increased either by improving cache management or by reducing the number of load instructions that reach the cache hierarchy. We introduce the notion of zero loads to approach the latter. This paper explores the potential of tracking locations that contain the value 'zero'. Loads directed to such locations -- termed Zero Loads -- can be cancelled before they are issued in the cache hierarchy. We find that as many as 21% of the loads are Zero Loads and about one third of them are critical, i.e., ends up on the critical memory path for out-of-order cores. Motivated by this observation, we explore two innovative structures to capture Zero Loads by essentially book-keeping earlier visited blocks/locations that return 'zero'. These schemes are shown to be capable of improving performance and power/energy efficiency considerably.