ACM Computing Surveys (CSUR)
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
The performance of cache-coherent ring-based multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Scalable cache consistency for hierarchically structured multiprocessors
The Journal of Supercomputing
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Ethernet: distributed packet switching for local computer networks
Communications of the ACM
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol
IEEE Transactions on Parallel and Distributed Systems
Starfire: Extending the SMP Envelope
IEEE Micro
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Self-Timed Ring for Globally-Asynchronous Locally-Synchronous Systems
ASYNC '03 Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems
Temporal verification of carrier-sense local area network protocols
POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
PANDA: Ring-Based Multiprocessor System Using New Snooping Protocol
ICPADS '98 Proceedings of the 1998 International Conference on Parallel and Distributed Systems
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
The Alpha 21364 Network Architecture
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Improving Multiple-CMP Systems Using Token Coherence
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
uComplexity: Estimating Processor Design Effort
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Extending CC-NUMA systems to support write update optimizations
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Routing in self-organizing nano-scale irregular networks
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Physical-Aware Link Allocation and Route Assignment for Chip Multiprocessing
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A workload-adaptive and reconfigurable bus architecture for multicore processors
International Journal of Reconfigurable Computing
Run-time energy management of manycore systems through reconfigurable interconnects
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Improving coherence protocol reactiveness by trading bandwidth for latency
Proceedings of the 9th conference on Computing Frontiers
A hybrid NoC design for cache coherence optimization for chip multiprocessors
Proceedings of the 49th Annual Design Automation Conference
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Ring interconnects may be an attractive solution for future chip multiprocessors because they can enable faster links than buses and simpler switches than arbitrary switched interconnects. Moreover, a ring naturally orders requests sufficiently to enable directory-less coherence, but not in the total order that buses provide for snooping coherence. Existing cache coherence protocols for rings either establish a (total) ordering point (ORDERING-POINT) or use a greedy order (GREEDY-ORDER) with unbounded retries. In this work, we propose a new class of ring protocols, RINGORDER, in which requests complete in ring position order to achieve two benefits. First, RING-ORDER improves performance relative to ORDERING-POINT by activating requests immediately instead of waiting for them to reach the ordering point. Second, it improves performance stability relative to GREEDY-ORDER by not using retries. Thus, the new RING-ORDER combines the best of ORDERING-POINT (good performance stability) with the best of GREEDY-ORDER (good average performance).