A composite and scalable cache coherence protocol for large scale CMPs

Authors:
Yi Xu;Yu Du;Youtao Zhang;Jun Yang
Affiliations:
University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 35
Cited 3

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An evaluation of directory schemes for cache coherence

25 years of the international symposia on Computer architecture (selected papers)
A Single-Chip Multiprocessor

Computer
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Die Stacking (3D) Microarchitecture

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
A New Solution to Coherence Problems in Multicache Systems

IEEE Transactions on Computers
Flattened Butterfly Topology for On-Chip Networks

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Corona: System Implications of Emerging Nanophotonic Technology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics

HOTI '08 Proceedings of the 2008 16th IEEE Symposium on High Performance Interconnects
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Rock: A High-Performance Sparc CMT Processor

IEEE Micro
Firefly: illuminating future network-on-chip with nanophotonics

Proceedings of the 36th annual international symposium on Computer architecture
Phastlane: a rapid transit optical routing network

Proceedings of the 36th annual international symposium on Computer architecture
Silicon-photonic clos networks for global on-chip communication

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Spectrum: a hybrid nanophotonic-electric on-chip network

Proceedings of the 46th Annual Design Automation Conference
In-network coherence filtering: snoopy coherence without broadcasts

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A tagless coherence directory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
An intra-chip free-space optical interconnect

Proceedings of the 37th annual international symposium on Computer architecture
Re-architecting DRAM memory systems with monolithically integrated silicon photonics

Proceedings of the 37th annual international symposium on Computer architecture
WAYPOINT: scaling coherence to thousand-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ATAC: a 1000-core cache-coherent processor with on-chip optical network

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration

Proceedings of the 26th ACM international conference on Supercomputing
Tolerating process variations in nanophotonic on-chip networks

Proceedings of the 39th Annual International Symposium on Computer Architecture
Co-tuning of a hybrid electronic-optical network for reducing energy consumption in embedded CMPs

Proceedings of the First International Workshop on Many-core Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The number of on-chip cores of modern chip multiprocessors (CMPs) is growing fast with technology scaling. However, it remains a big challenge to efficiently support cache coherence for large scale CMPs. The conventional snoopy and directory coherence protocols cannot be smoothly scaled to many-core or thousand-core processors. Snoopy protocols introduce large power overhead due to enormous amount of cache tag probing triggered by broadcast. Directory protocols introduce performance penalty due to indirection, and large storage overhead due to storing directories. This paper addresses the efficiency problem when supporting cache coherency for large-scale CMPs. By leveraging emerging optical on-chip interconnect (OP-I) technology to provide high bandwidth density, low propagation delay and natural support for multicast/broadcast in a hierarchical network organization, we propose a composite cache coherence (C3) protocol that benefits from direct cache-to-cache accesses as in snoopy protocol and small amount of cache probing as in directory protocol. Targeting at quickly completing coherence transactions, C3 organizes accesses in a three-tier hierarchy by combining a mix of designs including local broadcast prediction, filtering, and a coarse-grained directory. Compared to directory-based protocol[18], our evaluations on a thousand-core CMP show that C3 improves performance by 21%, reduces network latency of coherence messages by 41% and saves network energy consumption by 5.5% on average for PARSEC applications.