A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

  • Authors:
  • Jason Zebchuk;Elham Safi;Andreas Moshovos

  • Affiliations:
  • -;-;-

  • Venue:
  • Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.02

Visualization

Abstract

Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduction and prefetching suggest that further useful patterns emerge with a macroscopic, coarse-grain view. To exploit coarse- grain behavior, previous work extended conventional caches with additional coarse-grain tracking and management structures considerably increasing overall cost and complexity. This paper demonstrates that as multi-megabyte caches have become commonplace, coarse-grain tracking and management no longer needs to be an afterthought. This functionality comes "for free" via RegionTracker. RegionTracker is a dual-grain cache design that maintains block-level communication while directly supporting coarse-grain tracking and management. Compared to a block-centric conventional cache of the same data capacity, RegionTracker requires less area to achieve a nearly identical miss rate (within 1%). RegionTracker can be used as the building block for coarse-grain optimizations, reducing their overall cost and easing their adoption. Using full-system simulation of a quad- core chip multiprocessor, commercial workloads, and area estimates based on full-custom layouts on a 130nm commercial technology, we demonstrate the performance and cost viability of the RegionTracker design. We also demonstrate the potential of RegionTracker as a framework for coarse-grain optimizations by showing that it boosts the benefits and reduces the cost of a previously proposed snoop reduction technique.