Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Authors:
Liqun Cheng;Naveen Muralimanohar;Karthik Ramani;Rajeev Balasubramonian;John B. Carter
Affiliations:
University of Utah;University of Utah;University of Utah;University of Utah;University of Utah
Venue:
Proceedings of the 33rd annual international symposium on Computer Architecture
Year:
2006

Citing 28
Cited 39

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Digital systems engineering

Digital systems engineering
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992

IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Intel 870: A Building Block for Cost-Effective, Scalable Servers

IEEE Micro
A Power Model for Routers: Modeling Alpha 21364 and InfiniBand Routers

IEEE Micro
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
TLC: Transmission Line Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect-power dissipation in a microprocessor

Proceedings of the 2004 international workshop on System level interconnect prediction
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
The Soft Error Problem: An Architectural Perspective

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05

Leveraging Wire Properties at the Microarchitecture Level

IEEE Micro
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
The Power of Priority: NoC Based Distributed Cache Coherency

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
VEBoC: variation and error-aware design for billions of devices on a chip

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
A consistency architecture for hierarchical shared caches

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures

The Journal of Supercomputing
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Power reduction of CMP communication networks via RF-interconnects

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Triplet-based topology for on-chip networks

WSEAS Transactions on Computers
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Hardware Implementation Study of the SCFQ-CA and DRR-CA Scheduling Algorithms

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
81.6 GOPS object recognition processor based on a memory-centric NoC

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols

Formal Methods in System Design
Efficient message management in tiled CMP architectures using a heterogeneous interconnection network

HiPC'07 Proceedings of the 14th international conference on High performance computing
NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies

Proceedings of the 7th ACM international conference on Computing frontiers
The SKB: a semi-completely-connected bus for on-chip systems

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Journal of Systems Architecture: the EUROMICRO Journal
Proximity coherence for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Group-caching for NoC based multicore cache coherent systems

Proceedings of the Conference on Design, Automation and Test in Europe
An adaptive cache coherence protocol for chip multiprocessors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Enabling quality-of-service in nanophotonic network-on-chip

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Run-time energy management of manycore systems through reconfigurable interconnects

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
F2BFLY: an on-chip free-space optical network with wavelength-switching

Proceedings of the international conference on Supercomputing
A high efficient on-chip interconnection network in SIMD CMPs

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Using partial tag comparison in low-power snoop-based chip multiprocessors

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
An optimized multicore cache coherence design for exploiting communication locality

Proceedings of the great lakes symposium on VLSI
A hybrid NoC design for cache coherence optimization for chip multiprocessors

Proceedings of the 49th Annual Design Automation Conference
Hardware implementation study of several new egress link scheduling algorithms

Journal of Parallel and Distributed Computing
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion

Proceedings of the 39th Annual International Symposium on Computer Architecture
An efficient test design for CMPs cache coherence realizing MESI protocol

VDAT'12 Proceedings of the 16th international conference on Progress in VLSI Design and Test
Concerning with on-chip network features to improve cache coherence protocols for CMPs

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
A heterogeneous multiple network-on-chip design: an application-aware approach

Proceedings of the 50th Annual Design Automation Conference
Ordering circuit establishment in multiplane NoCs

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors

The Journal of Supercomputing
Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Asymmetrical topology and entropy-based heterogeneous link for many-core massive data communication

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Improvements in semiconductor technology have made it possible to include multiple processor cores on a single die. Chip Multi-Processors (CMP) are an attractive choice for future billion transistor architectures due to their low design complexity, high clock frequency, and high throughput. In a typical CMP architecture, the L2 cache is shared by multiple cores and data coherence is maintained among private L1s. Coherence operations entail frequent communication over global on-chip wires. In future technologies, communication between different L1s will have a significant impact on overall processor performance and power consumption. On-chip wires can be designed to have different latency, bandwidth, and energy properties. Likewise, coherence protocol messages have different latency and bandwidth needs. We propose an interconnect composed of wires with varying latency, bandwidth, and energy characteristics, and advocate intelligently mapping coherence operations to the appropriate wires. In this paper, we present a comprehensive list of techniques that allow coherence protocols to exploit a heterogeneous interconnect and evaluate a subset of these techniques to show their performance and power-efficiency potential. Most of the proposed techniques can be implemented with a minimum complexity overhead.