Data trace cache: an application specific cache architecture

  • Authors:
  • Subramanian Ramaswamy;Jaswanth Sreeram;Sudhakar Yalamanchili;Krishna V. Palem

  • Affiliations:
  • Georgia Institute of Technology, Atlanta, Georgia;Georgia Institute of Technology, Atlanta, Georgia;Georgia Institute of Technology, Atlanta, Georgia;Georgia Institute of Technology, Atlanta, Georgia

  • Venue:
  • MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Benefits of advances in processor technology have long been held hostage to the widening processor-memory gap. Off-chip memory access latency is one of the most critical parameters limiting system performance. Caches have been used as a way of alleviating this problem by reducing the average memory access latency. The memory bottleneck assumes greater significance for high performance computer architectures with high data throughput requirements such as network processors.This paper addresses the memory bottleneck with the goal of minimizing off-chip memory demand and average memory access latency by proposing the use of small application specific compiler-visible data trace caches. We focus on tree data structures which are responsible for a significant component of the memory traffic in several applications. We have observed that tree accesses create a simple to characterize trace of memory references and propose a data trace cache design to exploit the locality of reference in these data traces.Our study reveals that data trace caches can reduce the total number of misses from 7% to 53% for accesses to rooted tree data structures as compared to a conventional cache for a variety of applications for small cache sizes (256 - 1024 bytes). Such caches are in keeping with the philosophy of victim caches, stream buffers, and pre-fetch buffers in that relatively small investments in silicon can realize substantive reduction in off-chip memory bandwidth demand.