Load Redundancy Removal through Instruction Reuse

Authors:
Jun Yang;Rajiv Gupta
Affiliations:
-;-
Venue:
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Year:
2000

Citing 14
Cited 11

Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Array data flow analysis for load-store optimizations in fine-grain architectures

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Value locality and speculative execution

Value locality and speculative execution
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Value prediction in VLIW machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Load-reuse analysis: design and evaluation

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Read-after-read memory dependence prediction

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
An architectural alternative to optimizing compilers

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Global Context-Based Value Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Automatic Generation of Microarchitecture Simulators

ICCL '98 Proceedings of the 1998 International Conference on Computer Languages

Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
An efficient static analysis algorithm to detect redundant memory operations

Proceedings of the 2002 workshop on Memory system performance
On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Enhancing Speedup in Network Processing Applications by Exploiting Instruction Reuse with Flow Aggregation

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Limits for a feasible speculative trace reuse implementation

International Journal of High Performance Systems Architecture
A unified approach to eliminate memory accesses early

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Dynamic method to evaluate code optimization effectiveness

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
A system for debugging via online tracing and dynamic slicing

Software—Practice & Experience

Quantified Score

Hi-index	0.01

Visualization

Abstract

Instruction reuse techniques have been developed to detect and remove redundancy at runtime. By maintaining the execution history of an instruction, reuse techniques detect if a subsequent execution of an instruction will yield the same result as its previous execution, and if this is the case, the result is made available to dependent instructions without executing the instruction. This approach eliminates same instruction redundancy, that is, redundancy across different dynamic instances of the same static instruction. However, the main limitation of existing instruction reuse techniques is that they do not detect or eliminate different instruction redundancy, that is, redundancy across dynamic instances of static ally distinct instructions.We present instruction reuse techniques for load redundancy removal that eliminate both same and different instruction redundancy. We first present a study that shows that in addition to significant levels of same instruction redundancy (average of 20%), load instructions also contain high levels (average of 35%) of different instruction redundancy arising at other load or store instructions. We also describe studies that characterize the behavior of the redundancy and develop a hardware implementation guided by this characterization. Our experiments show that our techniques yield IPC improvements of up to 11% and reduces off-chip traffic due to cache misses by as much as 32% for SPECint95 benchmarks.