Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations

Authors:
G. Surendra;Subhasis Banerjee;S. K. Nandy
Affiliations:
(Correspd. surendra@cadl.iisc.ernet.in) CAD Lab, SERC, Indian Institute of Science, Bangalore 560012, India. E-mail: surendra@cadl.iisc.ernet.in/ subhasis@cadl.iisc.ernet.in/ nandy@serc.iisc.ernet ...;CAD Lab, SERC, Indian Institute of Science, Bangalore 560012, India. E-mail: surendra@cadl.iisc.ernet.in/ subhasis@cadl.iisc.ernet.in/ nandy@serc.iisc.ernet.in;CAD Lab, SERC, Indian Institute of Science, Bangalore 560012, India. E-mail: surendra@cadl.iisc.ernet.in/ subhasis@cadl.iisc.ernet.in/ nandy@serc.iisc.ernet.in
Venue:
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Year:
2006

Citing 24
Cited 1

Low power data processing by elimination of redundant computations

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting Basic Block Value Locality with Block Reuse

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Trace-Level Reuse

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Load Redundancy Removal through Instruction Reuse

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Energy-efficient issue queue design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The effectiveness of Instruction Reuse (IR) - a technique to eliminate redundant computations at run time - is limited by the fact that performance gain seldom exceeds 3% and is dependent on the criticality of instructions being "reused". In this paper, we focus on the power aspect of IR and propose a "resultbus optimization" that exploits communication reuse to reduce the power dissipated over a high capacitance resultbus. The effectiveness of this optimization depends on the number of result producing instructions that are reused and improves overall power and Energy-Delay Product (EDP) by 3% over a base IR policy for a 1024 entry "Reuse Buffer" (RB). As a domain specific study, we examine the impact of multithreading on IR in the context of packet header processing applications. Specifically, sharing the RB among threads can lead to either constructive or destructive interference, thereby increasing or decreasing the amount of IR that can be uncovered. Further, packet header processing applications are unique in the sense that repetition in data values within "flows" are quite prevalent which can be exploited to improve IR. We find that an architecture that uses this "flow" information to govern accesses to the RB improves IR by as much as 4.6% for header processing kernels.