A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Authors:
Jason Zebchuk;Elham Safi;Andreas Moshovos
Affiliations:
-;-;-
Venue:
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2007

Citing 0
Cited 14

Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors

Proceedings of the 13th international symposium on Low power electronics and design
A framework for low energy data management in reconfigurable multi-context architectures

Journal of Systems Architecture: the EUROMICRO Journal
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
In-network coherence filtering: snoopy coherence without broadcasts

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
Reducing last level cache pollution through OS-level software-controlled region-based partitioning

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A dual grain hit-miss detector for large die-stacked DRAM caches

Proceedings of the Conference on Design, Automation and Test in Europe
Building expressive, area-efficient coherence directories

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
TLC: a tag-less cache for reducing dynamic first level cache energy

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous system coherence for integrated CPU-GPU systems

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DP&TB: a coherence filtering protocol for many-core chip multiprocessors

The Journal of Supercomputing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduction and prefetching suggest that further useful patterns emerge with a macroscopic, coarse-grain view. To exploit coarse- grain behavior, previous work extended conventional caches with additional coarse-grain tracking and management structures considerably increasing overall cost and complexity. This paper demonstrates that as multi-megabyte caches have become commonplace, coarse-grain tracking and management no longer needs to be an afterthought. This functionality comes "for free" via RegionTracker. RegionTracker is a dual-grain cache design that maintains block-level communication while directly supporting coarse-grain tracking and management. Compared to a block-centric conventional cache of the same data capacity, RegionTracker requires less area to achieve a nearly identical miss rate (within 1%). RegionTracker can be used as the building block for coarse-grain optimizations, reducing their overall cost and easing their adoption. Using full-system simulation of a quad- core chip multiprocessor, commercial workloads, and area estimates based on full-custom layouts on a 130nm commercial technology, we demonstrate the performance and cost viability of the RegionTracker design. We also demonstrate the potential of RegionTracker as a framework for coarse-grain optimizations by showing that it boosts the benefits and reduces the cost of a previously proposed snoop reduction technique.