Dynamic instruction reuse

Authors:
Avinash Sodani;Gurindar S. Sohi
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 7
Cited 112

Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An architectural alternative to optimizing compilers

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
ON DIVISION AND RECIPROCAL CACHES

ON DIVISION AND RECIPROCAL CACHES
A computer architecture for the dynamic optimization of high-level language programs

A computer architecture for the dynamic optimization of high-level language programs
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Modeling program predictability

Proceedings of the 25th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Limits of Data Value Predictability

International Journal of Parallel Programming
Extending Value Reuse to Basic Blocks with Compiler Support

IEEE Transactions on Computers
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Hardware support for dynamic activation of compiler-directed computation reuse

ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Compiler controlled value prediction using branch predictor based confidence

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance improvement with circuit-level speculation

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the potential of tolerant region reuse for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Hardware support for dynamic activation of compiler-directed computation reuse

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Better exploration of region-level value locality with integrated computation reuse and value prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Silent Stores and Store Value Locality

IEEE Transactions on Computers
Characterization of value locality in Java programs

Workload characterization of emerging computer applications
Speculative dynamic vectorization

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Exploiting speculative value reuse using value prediction

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

International Journal of Parallel Programming
Changing Interaction of Compiler and Architecture

Computer
Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Register File Energy Reduction by Operand Data Reuse

PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
An efficient static analysis algorithm to detect redundant memory operations

Proceedings of the 2002 workshop on Memory system performance
Three extensions to register integration

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Recycling waste: exploiting wrong-path execution to improve branch prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Load Redundancy Removal through Instruction Reuse

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Non redundant data cache

Proceedings of the 2003 international symposium on Low power electronics and design
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
Multiple-path execution for chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scheduling Reusable Instructions for Power Reduction

Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Compiler Scheme for Reusing Intermediate Computation Results

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Enhancing Speedup in Network Processing Applications by Exploiting Instruction Reuse with Flow Aggregation

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Studying Storage-Recomputation Tradeoffs in Memory-Constrained Embedded Processing

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Control-Flow Independence Reuse via Dynamic Vectorization

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Improving Energy-Efficiency by Bypassing Trivial Computations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Compiling for memory emergency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Fuzzy Memoization for Floating-Point Multimedia Applications

IEEE Transactions on Computers
Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor

IEEE Transactions on Computers
SST: Symbolic Subordinate Threading

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Opportunistic Transient-Fault Detection

IEEE Micro
BranchTap: improving performance with very few checkpoints through adaptive speculation control

Proceedings of the 20th annual international conference on Supercomputing
Performance Enhancement by Eliminating Redundant Function Execution

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
By-passing the out-of-order execution pipeline to increase energy-efficiency

Proceedings of the 4th international conference on Computing frontiers
Speculative trivialization point advancing in high-performance processors

Journal of Systems Architecture: the EUROMICRO Journal
Ginger: control independence using tag rewriting

Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)

Proceedings of the 34th annual international symposium on Computer architecture
Working with process variation aware caches

Proceedings of the conference on Design, automation and test in Europe
Design and evaluation of an auto-memoization processor

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
SoftSig: software-exposed hardware signatures for code analysis and optimization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Block remap with turnoff: a variation-tolerant cache design technique

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Low power microarchitecture with instruction reuse

Proceedings of the 5th conference on Computing frontiers
Improving single-thread performance with fine-grain state maintenance

Proceedings of the 5th conference on Computing frontiers
Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Partial resolution for redundant operation table

Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Fetch-Criticality Reduction through Control Independence

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Zero loads: canceling load requests by tracking zero values

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploiting selective instruction reuse and value prediction in a superscalar architecture

Journal of Systems Architecture: the EUROMICRO Journal
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Reducing misspeculation penalty in trace-level speculative multithreaded architectures

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
EXACT: explicit dynamic-branch prediction with active updates

Proceedings of the 7th ACM international conference on Computing frontiers
A pattern based instruction encoding technique for high performance architectures

International Journal of High Performance Systems Architecture
Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Instruction precomputation with memoization for fault detection

Proceedings of the Conference on Design, Automation and Test in Europe
Window memoization: an efficient hardware architecture for high-performance image processing

Journal of Real-Time Image Processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A unified approach to eliminate memory accesses early

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
SYRANT: SYmmetric resource allocation on not-taken and taken paths

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Exploiting a computation reuse cache to reduce energy in network processors

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)
“Look it up" or "do the math": an energy, area, and timing analysis of instruction reuse and memoization

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Exploring the potential of architecture-level power optimizations

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Dynamic method to evaluate code optimization effectiveness

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

Proceedings of the 27th international ACM conference on International conference on supercomputing
Boosting efficiency of fault detection and recovery throughapplication-specific comparison and checkpointing

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
MELOADES: Methodology for long-term online adaptation of embedded software for heterogeneous devices

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper introduces the concept of dynamic instruction reuse. Empirical observations suggest that many instructions, and groups of instructions, having the same inputs, are executed dynamically. Such instructions do not have to be executed repeatedly --- their results can be obtained from a buffer where they were saved previously. This paper presents three hardware schemes for exploiting the phenomenon of dynamic instruction reuse, and evaluates their effectiveness using execution-driven simulation. We find that in some cases over 50% of the instructions can be reused. The speedups so obtained, though less striking than the percentage of instructions reused, are still quite significant.