On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

Authors:
G. Surendra;S. Banerjee;S. K. Nandy
Affiliations:
CAD Laboratory, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, India 560012;CAD Laboratory, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, India 560012;CAD Laboratory, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, India 560012
Venue:
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Year:
2003

Citing 19
Cited 2

Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Low power data processing by elimination of redundant computations

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Characterizing processor architectures for programmable network interfaces

Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Network processors: a perspective on market requirements, processor architectures and embedded S/W tools

Proceedings of the conference on Design, automation and test in Europe
Hardware support for dynamic activation of compiler-directed computation reuse

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Handling of packet dependencies: a critical issue for highly parallel network processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Load Redundancy Removal through Instruction Reuse

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Packet prediction for speculative cut-through switching

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to remove redundant computations at run-time. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications and attempt to answer the following questions: (1) How much of instruction repetition can be reused in packet processing applications?, (2) Can the temporal locality of network traffic be exploited to reduce interference in the Reuse Buffer and improve reuse? and (3) What is the effect of reuse on microarchitectural features such as resource contention and memory accesses? We use an execution driven simulation methodology to evaluate instruction reuse and find that for the benchmarks considered, 1 to 50% of the dynamic instructions are reused yielding performance improvement between 1 and 20%. To further improve reuse, a flow aggregation scheme as well as an architecture for exploiting the same is proposed. This scheme is mostly applicable to header processing applications and exploits temporal locality in packet data to uncover higher reuse. As a side effect, instruction reuse reduces memory traffic and improves performance.