SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
The AzusA 16-Way Itanium Server
IEEE Micro
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Micro-architecture techniques in the intel® E8870 scalable memory controller
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Power and performance optimization at the system level
Proceedings of the 2nd conference on Computing frontiers
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Handling task dependencies under strided and aliased references
Proceedings of the 24th ACM International Conference on Supercomputing
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Filtering directory lookups in CMPs
Microprocessors & Microsystems
Heterogeneous system coherence for integrated CPU-GPU systems
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings. This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.