Context-sensitive interprocedural points-to analysis in the presence of function pointers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Can program profiling support value prediction?
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Complete removal of redundant expressions
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Modeling program predictability
Proceedings of the 25th annual international symposium on Computer architecture
Understanding the differences between value prediction and instruction reuse
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value speculation scheduling for high performance processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An architectural alternative to optimizing compilers
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Exploiting Basic Block Value Locality with Block Reuse
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Hardware support for dynamic activation of compiler-directed computation reuse
ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the potential of tolerant region reuse for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Hardware support for dynamic activation of compiler-directed computation reuse
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
A Compiler Scheme for Reusing Intermediate Computation Results
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Fuzzy Memoization for Floating-Point Multimedia Applications
IEEE Transactions on Computers
Region-level approximate computation reuse for power reduction in multimedia applications
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Partial resolution for redundant operation table
Microprocessors & Microsystems
Reducing misspeculation penalty in trace-level speculative multithreaded architectures
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Hi-index | 0.01 |
Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture and compiler techniques to exploit value locality for large regions of code. The approach strives to eliminate redundant processor execution created by both instruction-level input repetition and recurrence of input data within high-level computations. In this approach, the compiler performs analysis to identify code regions whose computation can be reused during dynamic execution. The instruction set architecture provides a simple interface for the compiler to communicate the scope of each reuse region and its live-out register information to the hardware. During run time, the execution results of these reusable computation regions are recorded into hardware buffers for potential reuse. Each reuse can eliminate the execution of a large number of dynamic instructions. Furthermore, the actions needed to update the live-out registers can be performed at a higher degree of parallelism than the original code, breaking intrinsic dataflow dependence constraints. Initial results show that the compiler analysis can indeed identify large reuse regions. Overall, the approach can improve the performance of a 6-issue microarchitecture by an average of 30% for a collection of SPEC and integer benchmarks.