Data trace cache: an application specific cache architecture

Authors:
Subramanian Ramaswamy;Jaswanth Sreeram;Sudhakar Yalamanchili;Krishna V. Palem
Affiliations:
Georgia Institute of Technology, Atlanta, Georgia;Georgia Institute of Technology, Atlanta, Georgia;Georgia Institute of Technology, Atlanta, Georgia;Georgia Institute of Technology, Atlanta, Georgia
Venue:
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Year:
2005

Citing 12
Cited 1

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications

IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Packet classification using tuple space search

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Packet classification on multiple fields

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
Improving route lookup performance using network processor cache

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Framework for Data Prefetching Using Off-Line Training of Markovian Predictors

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Stateless core: a scalable approach for quality of service in the internet

Stateless core: a scalable approach for quality of service in the internet
Memory Hierarchy Design for a Multiprocessor Look-up Engine

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Customized placement for high performance embedded processor caches

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Benefits of advances in processor technology have long been held hostage to the widening processor-memory gap. Off-chip memory access latency is one of the most critical parameters limiting system performance. Caches have been used as a way of alleviating this problem by reducing the average memory access latency. The memory bottleneck assumes greater significance for high performance computer architectures with high data throughput requirements such as network processors.This paper addresses the memory bottleneck with the goal of minimizing off-chip memory demand and average memory access latency by proposing the use of small application specific compiler-visible data trace caches. We focus on tree data structures which are responsible for a significant component of the memory traffic in several applications. We have observed that tree accesses create a simple to characterize trace of memory references and propose a data trace cache design to exploit the locality of reference in these data traces.Our study reveals that data trace caches can reduce the total number of misses from 7% to 53% for accesses to rooted tree data structures as compared to a conventional cache for a variety of applications for small cache sizes (256 - 1024 bytes). Such caches are in keeping with the philosophy of victim caches, stream buffers, and pre-fetch buffers in that relatively small investments in silicon can realize substantive reduction in off-chip memory bandwidth demand.