A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache consistency protocol for multiprocessors with multistage networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Journal of Computer and System Sciences
Race-free interconnection networks and multiprocessor consistency
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Deadlock-free multicast wormhole routing in multicomputer networks
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Orca: A Language for Parallel Programming of Distributed Systems
IEEE Transactions on Software Engineering
The Stanford Dash Multiprocessor
Computer
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
The process group approach to reliable distributed computing
Communications of the ACM
An evaluation of directory protocols for medium-scale shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Concurrency control in asynchronous computations
Concurrency control in asynchronous computations
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Lamport clocks: verifying a directory cache-coherence protocol
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Tree-Based Multicasting on Wormhole Routed Multistage Interconnection Networks
ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Multicast on Myrinet using Link-Level Flow Control
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Using Lamport Clocks to Reason About Relaxed Memory Models
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A Hardware Multicast Routing Algorithm for Two-Dimensional Meshes
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Logical time coherence maintenance
Logical time coherence maintenance
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
Timestamp snooping: an approach for extending SMPs
ACM SIGPLAN Notices
Exploiting Network Locality for CC-NUMA Multiprocessors
The Journal of Supercomputing
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol
IEEE Transactions on Parallel and Distributed Systems
IEEE Concurrency
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Verifying Sequential Consistency on Shared-Memory Multiprocessors by Model Checking
IEEE Transactions on Parallel and Distributed Systems
STICS: SCSI-to-IP cache for storage area networks
Journal of Parallel and Distributed Computing
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 4th international conference on Computing frontiers
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
In-Network Caching for Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
In-network coherence filtering: snoopy coherence without broadcasts
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A two-level directory organization solution for CC-NUMA systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Virtual point-to-point connections for NoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special issue on the 2009 ACM/IEEE international symposium on networks-on-chip
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An adaptive cache coherence protocol for chip multiprocessors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Using partial tag comparison in low-power snoop-based chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Predicting Coherence Communication by Tracking Synchronization Points at Run Time
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
An efficient, low-cost routing framework for convex mesh partitions to support virtualization
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
This paper proposes a new coherence method called "multicast snooping" that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying a multicast "mask." Transactions are delivered with an ordered multicast network, such as an Isotach network, which eliminates the need for acknowledgment messages. Processors handle transactions as they would with a snooping protocol, while a simplified directory operates in parallel to check masks and gracefully handle incorrect ones (e.g., previous owner missing). Preliminary performance numbers with mostly SPLASH-2 benchmarks running on 32 processors show that we can limit multicasts to an average of 2-6 destinations (<< 32) and we can deliver 2-5 multicasts per network cycle (>> broadcast snooping's 1 per cycle). While these results do not include timing, they do provide encouragement that multicast snooping can obtain data directly (like broadcast snooping) but apply to larger systems (like directories).