Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An architectural alternative to optimizing compilers
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
ON DIVISION AND RECIPROCAL CACHES
ON DIVISION AND RECIPROCAL CACHES
A computer architecture for the dynamic optimization of high-level language programs
A computer architecture for the dynamic optimization of high-level language programs
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Modeling program predictability
Proceedings of the 25th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accelerating multi-media processing by implementing memoing in multiplication and division units
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Limits of Data Value Predictability
International Journal of Parallel Programming
Extending Value Reuse to Basic Blocks with Compiler Support
IEEE Transactions on Computers
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Hardware support for dynamic activation of compiler-directed computation reuse
ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Compiler controlled value prediction using branch predictor based confidence
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance improvement with circuit-level speculation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the potential of tolerant region reuse for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
Hardware support for dynamic activation of compiler-directed computation reuse
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Silent Stores and Store Value Locality
IEEE Transactions on Computers
Characterization of value locality in Java programs
Workload characterization of emerging computer applications
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Exploiting speculative value reuse using value prediction
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors
International Journal of Parallel Programming
Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Register File Energy Reduction by Operand Data Reuse
PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
An efficient static analysis algorithm to detect redundant memory operations
Proceedings of the 2002 workshop on Memory system performance
Three extensions to register integration
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Recycling waste: exploiting wrong-path execution to improve branch prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Load Redundancy Removal through Instruction Reuse
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Proceedings of the 2003 international symposium on Low power electronics and design
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scheduling Reusable Instructions for Power Reduction
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Compiler Scheme for Reusing Intermediate Computation Results
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 31st annual international symposium on Computer architecture
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Control Flow Optimization Via Dynamic Reconvergence Prediction
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Studying Storage-Recomputation Tradeoffs in Memory-Constrained Embedded Processing
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Improving Energy-Efficiency by Bypassing Trivial Computations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Compiling for memory emergency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
RENO: A Rename-Based Instruction Optimizer
Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
Fuzzy Memoization for Floating-Point Multimedia Applications
IEEE Transactions on Computers
Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor
IEEE Transactions on Computers
SST: Symbolic Subordinate Threading
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Opportunistic Transient-Fault Detection
IEEE Micro
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
Performance Enhancement by Eliminating Redundant Function Execution
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
By-passing the out-of-order execution pipeline to increase energy-efficiency
Proceedings of the 4th international conference on Computing frontiers
Speculative trivialization point advancing in high-performance processors
Journal of Systems Architecture: the EUROMICRO Journal
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
Working with process variation aware caches
Proceedings of the conference on Design, automation and test in Europe
Design and evaluation of an auto-memoization processor
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Block remap with turnoff: a variation-tolerant cache design technique
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Low power microarchitecture with instruction reuse
Proceedings of the 5th conference on Computing frontiers
Improving single-thread performance with fine-grain state maintenance
Proceedings of the 5th conference on Computing frontiers
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Partial resolution for redundant operation table
Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploiting selective instruction reuse and value prediction in a superscalar architecture
Journal of Systems Architecture: the EUROMICRO Journal
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Reducing misspeculation penalty in trace-level speculative multithreaded architectures
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
EXACT: explicit dynamic-branch prediction with active updates
Proceedings of the 7th ACM international conference on Computing frontiers
A pattern based instruction encoding technique for high performance architectures
International Journal of High Performance Systems Architecture
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Instruction precomputation with memoization for fault detection
Proceedings of the Conference on Design, Automation and Test in Europe
Window memoization: an efficient hardware architecture for high-performance image processing
Journal of Real-Time Image Processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A unified approach to eliminate memory accesses early
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
SYRANT: SYmmetric resource allocation on not-taken and taken paths
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Exploiting a computation reuse cache to reduce energy in network processors
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Exploring the potential of architecture-level power optimizations
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Dynamic method to evaluate code optimization effectiveness
Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Proceedings of the 27th international ACM conference on International conference on supercomputing
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
MELOADES: Methodology for long-term online adaptation of embedded software for heterogeneous devices
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.02 |
This paper introduces the concept of dynamic instruction reuse. Empirical observations suggest that many instructions, and groups of instructions, having the same inputs, are executed dynamically. Such instructions do not have to be executed repeatedly --- their results can be obtained from a buffer where they were saved previously. This paper presents three hardware schemes for exploiting the phenomenon of dynamic instruction reuse, and evaluates their effectiveness using execution-driven simulation. We find that in some cases over 50% of the instructions can be reused. The speedups so obtained, though less striking than the percentage of instructions reused, are still quite significant.