Circuit-Switched Coherence

Authors:
Natalie D. Enright Jerger;Li-Shiuan Peh;Mikko H. Lipasti
Affiliations:
-;-;-
Venue:
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Year:
2008

Citing 19
Cited 22

A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks

IEEE Transactions on Parallel and Distributed Systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SoCBUS: Switched Network on Chip for Hard Real Time Embedded Systems

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An Energy-Efficient Reconfigurable Circuit-Switched Network-on-Chip

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
A Hybrid SoC Interconnect with Dynamic TDMA-Based Transaction-Less Buses and On-Chip Networks

VLSID '06 Proceedings of the 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Flattened Butterfly Topology for On-Chip Networks

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
An Evaluation of Server Consolidation Workloads for Multi-Core Designs

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization

Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Token flow control

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A case for bufferless routing in on-chip networks

Proceedings of the 36th annual international symposium on Computer architecture
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
SCARAB: a single cycle adaptive routing and bufferless network

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Impact of Half-Duplex and Full-Duplex DMA Implementations on NoC Performance

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Token tenure and PATCH: A predictive/adaptive token-counting hybrid

ACM Transactions on Architecture and Code Optimization (TACO)
NoC-aware cache design for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Thread criticality support in on-chip networks

Proceedings of the Third International Workshop on Network on Chip Architectures
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Inferring packet dependencies to improve trace based simulation of on-chip networks

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Towards self-adaptive networks on chip for massively parallel processors: multilevel quality of service programmability

Proceedings of the 8th ACM International Conference on Computing Frontiers
Clustered NOC, a suitable design for group communications in Network on Chip

Computers and Electrical Engineering
Concurrent hybrid switching for massively parallel systems-on-chip: the CYBER architecture

Proceedings of the 9th conference on Computing Frontiers
Predicting Coherence Communication by Tracking Synchronization Points at Run Time

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hierarchical and multiple switching NoC with floorplan based adaptability

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Proactive circuit allocation in multiplane NoCs

Proceedings of the 50th Annual Design Automation Conference
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era

ACM Transactions on Architecture and Code Optimization (TACO)
PAIS: Parallelism-aware interconnect scheduling in multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our characterization of a suite of commercial and scientific workloads on a 16-core cache-coherent chip multiprocessor (CMP) shows that overall system performance is sensitive to on-chip communication latency, and can degrade by 20% or more due to long interconnect latencies. On the other hand, communication bandwidth demand is low. These results prompt us to explore circuit-switched networks. Circuit-switched networks can significantly lower the communication latency between processor cores, when compared to packet-switched networks, since once circuits are set up, communication latency approaches pure interconnect delay. However, if circuits are not frequently reused, the long setup time can hurt overall performance, as is demonstrated by the poor performance of traditional circuit-switched networks -- all applications saw a slowdown rather than a speedup with a traditional circuit-switched network.To combat this problem, we propose hybrid circuit switching (HCS), a network design which removes the circuit setup time overhead by intermingling packet-switched flits with circuit-switched flits. Additionally, we co-design a prediction-based coherence protocol that leverages the existence of circuits to optimize pair-wise sharing between cores. The protocol allows pair-wise sharers to communicate directly with each other via circuits and drives up circuit reuse. Circuit-switched coherence provides up to 23% savings in network latency which leads to an overall system performance improvement of up to 15%.In short, we show HCS delivering the latency benefits of circuit switching,while sustaining the throughput benefits of packet switching, in a design realizable with low area and power overhead.