An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2)
Proceedings of the 2007 international symposium on Physical design
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
IBM Journal of Research and Development
Cache system design in the tightly coupled multiprocessor system
AFIPS '76 Proceedings of the June 7-10, 1976, national computer conference and exposition
ACM SIGARCH Computer Architecture News
In-network coherence filtering: snoopy coherence without broadcasts
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
SigNet: network-on-chip filtering for coarse vector directories
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed that a big fraction of directory lookups cause a miss, because the block looked up is not allocated in any local cache. To reduce the number of directory lookups and therefore the power consumption, we propose to add a filter before the directory access. We introduce two filter implementations. In the first one, filtering information is explicitly kept in the shared cache for every block. In the second one, filtering information is decoupled from the shared cache organization, so the filter size does not depend on the shared cache size. We evaluate our filters in a CMP with 8 in-order processors with 4 threads each and a memory hierarchy with write-through local caches and a shared cache. We show that, for SPLASH2 benchmarks, the proposed filters reduce the number of directory lookups performed by 60% while power consumption is reduced by ~28%. For Specweb2005, the number of directory lookups performed is reduced by 68% (44%), while directory power consumption is reduced by 19% (9%) using the first (second) filter implementation.