Low power data processing by elimination of redundant computations
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting Basic Block Value Locality with Block Reuse
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Load Redundancy Removal through Instruction Reuse
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Energy-efficient issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
CommBench-a telecommunications benchmark for network processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
Hi-index | 0.00 |
The effectiveness of Instruction Reuse (IR) - a technique to eliminate redundant computations at run time - is limited by the fact that performance gain seldom exceeds 3% and is dependent on the criticality of instructions being "reused". In this paper, we focus on the power aspect of IR and propose a "resultbus optimization" that exploits communication reuse to reduce the power dissipated over a high capacitance resultbus. The effectiveness of this optimization depends on the number of result producing instructions that are reused and improves overall power and Energy-Delay Product (EDP) by 3% over a base IR policy for a 1024 entry "Reuse Buffer" (RB). As a domain specific study, we examine the impact of multithreading on IR in the context of packet header processing applications. Specifically, sharing the RB among threads can lead to either constructive or destructive interference, thereby increasing or decreasing the amount of IR that can be uncovered. Further, packet header processing applications are unique in the sense that repetition in data values within "flows" are quite prevalent which can be exploited to improve IR. We find that an architecture that uses this "flow" information to govern accesses to the RB improves IR by as much as 4.6% for header processing kernels.