Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Low power data processing by elimination of redundant computations
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accelerating multi-media processing by implementing memoing in multiplication and division units
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Characterizing processor architectures for programmable network interfaces
Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the conference on Design, automation and test in Europe
Hardware support for dynamic activation of compiler-directed computation reuse
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Handling of packet dependencies: a critical issue for highly parallel network processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Load Redundancy Removal through Instruction Reuse
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
CommBench-a telecommunications benchmark for network processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Packet prediction for speculative cut-through switching
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Hi-index | 0.00 |
Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to remove redundant computations at run-time. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications and attempt to answer the following questions: (1) How much of instruction repetition can be reused in packet processing applications?, (2) Can the temporal locality of network traffic be exploited to reduce interference in the Reuse Buffer and improve reuse? and (3) What is the effect of reuse on microarchitectural features such as resource contention and memory accesses? We use an execution driven simulation methodology to evaluate instruction reuse and find that for the benchmarks considered, 1 to 50% of the dynamic instructions are reused yielding performance improvement between 1 and 20%. To further improve reuse, a flow aggregation scheme as well as an architecture for exploiting the same is proposed. This scheme is mostly applicable to header processing applications and exploits temporal locality in packet data to uncover higher reuse. As a side effect, instruction reuse reduces memory traffic and improves performance.