Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
IEEE/ACM Transactions on Networking (TON)
An empirical analysis of instruction repetition
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A flexible accelerator for layer 7 networking applications
Proceedings of the 39th annual Design Automation Conference
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Load Redundancy Removal through Instruction Reuse
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A pipelined memory architecture for high throughput network processors
Proceedings of the 30th annual international symposium on Computer architecture
Efficient use of memory bandwidth to improve network processor throughput
Proceedings of the 30th annual international symposium on Computer architecture
Reducing Design Complexity of the Load/Store Queue
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Snort - Lightweight Intrusion Detection for Networks
LISA '99 Proceedings of the 13th USENIX conference on System administration
Hi-index | 0.00 |
Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.