Compiler-directed dynamic computation reuse: rationale and initial results

Authors:
Daniel A. Connors;Wen-mei W. Hwu
Affiliations:
Department of Electrical and Computer Engineering, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL;Department of Electrical and Computer Engineering, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL
Venue:
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Year:
1999

Citing 16
Cited 20

Context-sensitive interprocedural points-to analysis in the presence of function pointers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Can program profiling support value prediction?

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Complete removal of redundant expressions

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Modeling program predictability

Proceedings of the 25th annual international symposium on Computer architecture
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value speculation scheduling for high performance processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An architectural alternative to optimizing compilers

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Exploiting Basic Block Value Locality with Block Reuse

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Hardware support for dynamic activation of compiler-directed computation reuse

ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the potential of tolerant region reuse for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Hardware support for dynamic activation of compiler-directed computation reuse

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Better exploration of region-level value locality with integrated computation reuse and value prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
A Compiler Scheme for Reusing Intermediate Computation Results

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Fuzzy Memoization for Floating-Point Multimedia Applications

IEEE Transactions on Computers
Region-level approximate computation reuse for power reduction in multimedia applications

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SoftSig: software-exposed hardware signatures for code analysis and optimization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Partial resolution for redundant operation table

Microprocessors & Microsystems
Reducing misspeculation penalty in trace-level speculative multithreaded architectures

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
“Look it up" or "do the math": an energy, area, and timing analysis of instruction reuse and memoization

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture and compiler techniques to exploit value locality for large regions of code. The approach strives to eliminate redundant processor execution created by both instruction-level input repetition and recurrence of input data within high-level computations. In this approach, the compiler performs analysis to identify code regions whose computation can be reused during dynamic execution. The instruction set architecture provides a simple interface for the compiler to communicate the scope of each reuse region and its live-out register information to the hardware. During run time, the execution results of these reusable computation regions are recorded into hardware buffers for potential reuse. Each reuse can eliminate the execution of a large number of dynamic instructions. Furthermore, the actions needed to update the live-out registers can be performed at a higher degree of parallelism than the original code, breaking intrinsic dataflow dependence constraints. Initial results show that the compiler analysis can indeed identify large reuse regions. Overall, the approach can improve the performance of a 6-issue microarchitecture by an average of 30% for a collection of SPEC and integer benchmarks.