A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The PowerPC architecture: a specification for a new family of RISC processors
The PowerPC architecture: a specification for a new family of RISC processors
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
The sun fireplane system interconnect
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A dynamic cache sub-block design to reduce false sharing
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Victim management in a cache hierarchy
IBM Journal of Research and Development - Advanced silicon technology
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 45th annual Design Automation Conference
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
Proceedings of the 13th international symposium on Low power electronics and design
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
To Snoop or Not to Snoop: Evaluation of Fine-Grain and Coarse-Grain Snoop Filtering Techniques
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 23rd international conference on Supercomputing
In-network coherence filtering: snoopy coherence without broadcasts
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low-power snoop architecture for synchronized producer-consumer embedded multiprocessing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Virtual Snooping: Filtering Snoops in Virtualized Multi-cores
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A workload-adaptive and reconfigurable bus architecture for multicore processors
International Journal of Reconfigurable Computing
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Filtering directory lookups in CMPs
Microprocessors & Microsystems
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Using partial tag comparison in low-power snoop-based chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 40th Annual International Symposium on Computer Architecture
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous system coherence for integrated CPU-GPU systems
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DP&TB: a coherence filtering protocol for many-core chip multiprocessors
The Journal of Supercomputing
Hi-index | 0.00 |
To main coherence in conventional shared-memory multiprocessor systems, processors first check other proessors' caches before obtaining data from memory. This coherence checking adds latency to memory requests and leads to large amounts of interconnect traffic in broadcast-based systems. Our results for a set of commercial, scientific and multiprogrammed workloads show that on average 67% (and up to 94%) of broadcasts are unnecessary. Coarse-Grain Coherence Tracking is a new technique that supplements a conventional coherence mechanism and optimizes the performance of coherence enforcement. The Coarse-Grain Coherence mechanism monitors the coherence status of large regions of memory, and uses that information to avoid unnecessary broadcasts. Coarse-Grain Coherence Tracking is shown to eliminate 55-97% of the unnecessary broadcasts, and improve performance by 8.8% on average (and up to 21.7%).