Improving the accuracy of snoop filtering using stream registers

Authors:
Valentina Salapura;Matthias Blumrich;Alan Gara
Affiliations:
IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Year:
2007

Citing 11
Cited 6

SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors

Proceedings of the 2002 international symposium on Low power electronics and design
The AzusA 16-Way Itanium Server

IEEE Micro
The Augmint multiprocessor simulation toolkit for Intel x86 architectures

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Micro-architecture techniques in the intel® E8870 scalable memory controller

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Power and performance optimization at the system level

Proceedings of the 2nd conference on Computing frontiers
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture

Handling task dependencies under strided and aliased references

Proceedings of the 24th ACM International Conference on Supercomputing
TurboTag: lookup filtering to reduce coherence directory power

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Exploring the architecture of a stream register-based snoop filter

Transactions on high-performance embedded architectures and compilers III
Filtering directory lookups in CMPs with write-through caches

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Filtering directory lookups in CMPs

Microprocessors & Microsystems
Heterogeneous system coherence for integrated CPU-GPU systems

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings. This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.