Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
Energy-efficient load and store reuse
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Silent Stores and Store Value Locality
IEEE Transactions on Computers
Frequent value locality and its applications
ACM Transactions on Embedded Computing Systems (TECS)
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Energy efficient frequent value data cache design
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Load Redundancy Removal through Instruction Reuse
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Scalable Load and Store Processing in Latency Tolerant Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Selective writeback: reducing register file pressure and energy consumption
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 23rd international conference on Supercomputing
Zero-Value Caches: Cancelling Loads that Return Zero
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
This paper introduces the notion of silent loads to classify load accesses that can be satisfied by already available values of the physical register file and proposes a new architectural concept to exploit such loads. The paper then unifies different approaches of eliminating memory accesses early by contributing with a new architectural scheme. We show that our unified approach covers previously proposed techniques of exploiting forwarded and small-value loads in addition to silent loads. Forwarded loads obtain values through load-to-load and store-to-load forwarding whereas small-value loads return small values that can be coded with 8 bits or less. We find that 22%, 31% and 24% of all dynamic loads are forwarded, small-value and silent, respectively. We demonstrate that the prevalence of such loads is mostly inherent in applications. We establish that a hypothetical scheme that encompasses all the categories can eliminate as many as 42% of all dynamic loads and about 18% of all committed stores. Finally, we contribute with a new architectural technique to implement the unified scheme. We show that our proposed scheme reduces execution time to provide noticeable speedup and reduces overall energy dissipation with very low area overhead.