Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Array data flow analysis for load-store optimizations in fine-grain architectures
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Proceedings of the 24th annual international symposium on Computer architecture
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Value locality and speculative execution
Value locality and speculative execution
Understanding the differences between value prediction and instruction reuse
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Value prediction in VLIW machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Load-reuse analysis: design and evaluation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Read-after-read memory dependence prediction
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
An architectural alternative to optimizing compilers
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Global Context-Based Value Prediction
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Automatic Generation of Microarchitecture Simulators
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
An efficient static analysis algorithm to detect redundant memory operations
Proceedings of the 2002 workshop on Memory system performance
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
Limits for a feasible speculative trace reuse implementation
International Journal of High Performance Systems Architecture
A unified approach to eliminate memory accesses early
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Dynamic method to evaluate code optimization effectiveness
Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
A system for debugging via online tracing and dynamic slicing
Software—Practice & Experience
Hi-index | 0.01 |
Instruction reuse techniques have been developed to detect and remove redundancy at runtime. By maintaining the execution history of an instruction, reuse techniques detect if a subsequent execution of an instruction will yield the same result as its previous execution, and if this is the case, the result is made available to dependent instructions without executing the instruction. This approach eliminates same instruction redundancy, that is, redundancy across different dynamic instances of the same static instruction. However, the main limitation of existing instruction reuse techniques is that they do not detect or eliminate different instruction redundancy, that is, redundancy across dynamic instances of static ally distinct instructions.We present instruction reuse techniques for load redundancy removal that eliminate both same and different instruction redundancy. We first present a study that shows that in addition to significant levels of same instruction redundancy (average of 20%), load instructions also contain high levels (average of 35%) of different instruction redundancy arising at other load or store instructions. We also describe studies that characterize the behavior of the redundancy and develop a hardware implementation guided by this characterization. Our experiments show that our techniques yield IPC improvements of up to 11% and reduces off-chip traffic due to cache misses by as much as 32% for SPECint95 benchmarks.