In-Network Cache Coherence

Authors:
Noel Eisley;Li-Shiuan Peh;Li Shang
Affiliations:
Princeton University;Princeton University;Queen's University
Venue:
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2006

Citing 26
Cited 29

An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Introducing memory into the switch elements of multiprocessor interconnection networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The DASH prototype: implementation and performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The GLOW cache coherence protocol extensions for widely shared data

ICS '96 Proceedings of the 10th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
CACHET: an adaptive cache coherence protocol for distributed shared-memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Bochs: A Portable PC Emulator for Unix/X

Linux Journal
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Two economical directory schemes for large-scale cache coherent multiprocessors

ACM SIGARCH Computer Architecture News
Shared Memory Consistency Models: A Tutorial

Computer
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Performance Evaluation of the Slotted Ring Multiprocessor

IEEE Transactions on Computers
The Murphi Verification System

CAV '96 Proceedings of the 8th International Conference on Computer Aided Verification
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
The Alpha 21364 Network Architecture

HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
A New Scalable Directory Architecture for Large-Scale Multiprocessors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Speculative Incoherent Cache Protocols

IEEE Micro
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture

Proximity-aware directory-based coherence for multi-core processor architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A novel dimensionally-decomposed router for on-chip communication in 3D architectures

Proceedings of the 34th annual international symposium on Computer architecture
The Power of Priority: NoC Based Distributed Cache Coherency

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Research Challenges for On-Chip Interconnection Networks

IEEE Micro
A consistency architecture for hierarchical shared caches

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Comparison of memory write policies for NoC based multicore cache coherent systems

Proceedings of the conference on Design, automation and test in Europe
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Improving support for locality and fine-grain sharing in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Leveraging on-chip networks for data cache migration in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
In-Network Caching for Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Push-assisted migration of real-time tasks in multi-core processors

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
81.6 GOPS object recognition processor based on a memory-centric NoC

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
In-network coherence filtering: snoopy coherence without broadcasts

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Subspace snooping: filtering snoops with operating system support

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proximity coherence for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SWEL: hardware cache coherence protocols to map shared data onto shared caches

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Latency criticality aware on-chip communication

Proceedings of the Conference on Design, Automation and Test in Europe
An adaptive cache coherence protocol for chip multiprocessors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
A workload-adaptive and reconfigurable bus architecture for multicore processors

International Journal of Reconfigurable Computing
Run-time energy management of manycore systems through reconfigurable interconnects

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
A composite and scalable cache coherence protocol for large scale CMPs

Proceedings of the international conference on Supercomputing
Inferring packet dependencies to improve trace based simulation of on-chip networks

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
An optimized multicore cache coherence design for exploiting communication locality

Proceedings of the great lakes symposium on VLSI
Concerning with on-chip network features to improve cache coherence protocols for CMPs

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Predicting Coherence Communication by Tracking Synchronization Points at Run Time

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient multicast schemes for 3-D Networks-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the trend towards increasing number of processor cores in future chip architectures, scalable directory-based protocols for maintaining cache coherence will be needed. However, directory-based protocols face well-known problems in delay and scalability. Most current protocol optimizations targeting these problems maintain a firm abstraction of the interconnection network fabric as a communication medium: protocol optimizations consist of endto- end messages between requestor, directory and sharer nodes, while network optimizations separately target lowering communication latency for coherence messages. In this paper, we propose an implementation of the cache coherence protocol within the network, embedding directories within each router node that manage and steer requests towards nearby data copies, enabling in-transit optimization of memory access delay. Simulation results across a range of SPLASH-2 benchmarks demonstrate significant performance improvement and good system scalability, with up to 44.5% and 56% savings in average memory access latency for 16 and 64-node systems, respectively, when compared against the baseline directory cache coherence protocol. Detailed microarchitecture and implementation characterization affirms the low area and delay impact of in-network coherence.