Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Packet classification using tuple space search
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Packet classification on multiple fields
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Improving route lookup performance using network processor cache
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Framework for Data Prefetching Using Off-Line Training of Markovian Predictors
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Stateless core: a scalable approach for quality of service in the internet
Stateless core: a scalable approach for quality of service in the internet
Memory Hierarchy Design for a Multiprocessor Look-up Engine
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
CommBench-a telecommunications benchmark for network processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Hi-index | 0.00 |
Benefits of advances in processor technology have long been held hostage to the widening processor-memory gap. Off-chip memory access latency is one of the most critical parameters limiting system performance. Caches have been used as a way of alleviating this problem by reducing the average memory access latency. The memory bottleneck assumes greater significance for high performance computer architectures with high data throughput requirements such as network processors.This paper addresses the memory bottleneck with the goal of minimizing off-chip memory demand and average memory access latency by proposing the use of small application specific compiler-visible data trace caches. We focus on tree data structures which are responsible for a significant component of the memory traffic in several applications. We have observed that tree accesses create a simple to characterize trace of memory references and propose a data trace cache design to exploit the locality of reference in these data traces.Our study reveals that data trace caches can reduce the total number of misses from 7% to 53% for accesses to rooted tree data structures as compared to a conventional cache for a variety of applications for small cache sizes (256 - 1024 bytes). Such caches are in keeping with the philosophy of victim caches, stream buffers, and pre-fetch buffers in that relatively small investments in silicon can realize substantive reduction in off-chip memory bandwidth demand.