Reducing the Interconnection Network Cost of Chip Multiprocessors

Authors:
Pablo Abad;Valentin Puente;Jose Angel Gregorio
Affiliations:
-;-;-
Venue:
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Year:
2008

Citing 21
Cited 1

The SP2 high-performance switch

IBM Systems Journal
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
The adaptive bubble router

Journal of Parallel and Distributed Computing
Simics: A Full System Simulation Platform

Computer
The Alpha 21364 Network Architecture

IEEE Micro
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Progressive Approach to Handling Message-Dependent Deadlock in Parallel Computer Systems

IEEE Transactions on Parallel and Distributed Systems
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The design and implementation of a low-latency on-chip network

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Rotary router: an efficient architecture for CMP interconnection networks

Proceedings of the 34th annual international symposium on Computer architecture
SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a cost-effective technique to deal with CMP coherence protocol requirements from the interconnection network point of view. A mechanism is presented to avoid the end-to-end deadlock that arises from the dependency chains created at the network interfaces between the different message types handled by coherence protocols. Our proposal is designed to guarantee a fraction of end-to-end bandwidth for the highest priority messages and makes it unnecessary to employ several virtual networks or complex mechanisms for dealing with the limited capacity of the endpoint buffers. The presented approach uses the Rotary Router as its starting point, extending the original mechanism for the routing-dependent deadlock to the message-dependent deadlock. We also propose a solution that guarantees point-to-point message ordering in this router, which is a common requirement in some coherence protocols. Results for synthetic and parallel applications show that the proposal improves the performance of previous solutions with a much lower hardware cost.