A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks
IEEE Transactions on Parallel and Distributed Systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SoCBUS: Switched Network on Chip for Hard Real Time Embedded Systems
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An Energy-Efficient Reconfigurable Circuit-Switched Network-on-Chip
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
A Hybrid SoC Interconnect with Dynamic TDMA-Based Transaction-Less Buses and On-Chip Networks
VLSID '06 Proceedings of the 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
An Evaluation of Server Consolidation Workloads for Multi-Core Designs
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
SCARAB: a single cycle adaptive routing and bufferless network
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Impact of Half-Duplex and Full-Duplex DMA Implementations on NoC Performance
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
NoC-aware cache design for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Thread criticality support in on-chip networks
Proceedings of the Third International Workshop on Network on Chip Architectures
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Inferring packet dependencies to improve trace based simulation of on-chip networks
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the 8th ACM International Conference on Computing Frontiers
Clustered NOC, a suitable design for group communications in Network on Chip
Computers and Electrical Engineering
Concurrent hybrid switching for massively parallel systems-on-chip: the CYBER architecture
Proceedings of the 9th conference on Computing Frontiers
Predicting Coherence Communication by Tracking Synchronization Points at Run Time
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hierarchical and multiple switching NoC with floorplan based adaptability
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Proactive circuit allocation in multiplane NoCs
Proceedings of the 50th Annual Design Automation Conference
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
ACM Transactions on Architecture and Code Optimization (TACO)
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Our characterization of a suite of commercial and scientific workloads on a 16-core cache-coherent chip multiprocessor (CMP) shows that overall system performance is sensitive to on-chip communication latency, and can degrade by 20% or more due to long interconnect latencies. On the other hand, communication bandwidth demand is low. These results prompt us to explore circuit-switched networks. Circuit-switched networks can significantly lower the communication latency between processor cores, when compared to packet-switched networks, since once circuits are set up, communication latency approaches pure interconnect delay. However, if circuits are not frequently reused, the long setup time can hurt overall performance, as is demonstrated by the poor performance of traditional circuit-switched networks -- all applications saw a slowdown rather than a speedup with a traditional circuit-switched network.To combat this problem, we propose hybrid circuit switching (HCS), a network design which removes the circuit setup time overhead by intermingling packet-switched flits with circuit-switched flits. Additionally, we co-design a prediction-based coherence protocol that leverages the existence of circuits to optimize pair-wise sharing between cores. The protocol allows pair-wise sharers to communicate directly with each other via circuits and drives up circuit reuse. Circuit-switched coherence provides up to 23% savings in network latency which leads to an overall system performance improvement of up to 15%.In short, we show HCS delivering the latency benefits of circuit switching,while sustaining the throughput benefits of packet switching, in a design realizable with low area and power overhead.