Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Express Cubes: Improving the Performance of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Spider: A High-Speed Network Interconnect
IEEE Micro
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Networks on Chip: A New Paradigm for Systems on Chip Design
Proceedings of the conference on Design, automation and test in Europe
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitectural Wire Management for Performance and Power in Partitioned Architectures
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A low latency router supporting adaptivity for on-chip interconnects
Proceedings of the 42nd annual Design Automation Conference
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
3D Chip Stack Technology Using Through-Chip Interconnects
IEEE Design & Test
Implementation analysis of NoC: a MPSoC trace-driven approach
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoC Topologies Exploration based on Mapping and Simulation Models
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
A Domain-Specific On-Chip Network Design for Large Scale Cache Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Evaluating Bufferless Flow Control for On-chip Networks
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A case for globally shared-medium on-chip interconnect
Proceedings of the 38th annual international symposium on Computer architecture
BOFAR: buffer occupancy factor based adaptive router for mesh NoCs
Proceedings of the 4th International Workshop on Network on Chip Architectures
Packet chaining: efficient single-cycle allocation for on-chip networks
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Enhancing effective throughput for transmission line-based bus
Proceedings of the 39th Annual International Symposium on Computer Architecture
Cost-effective contention avoidance in a CMP with shared memory controllers
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Characterization and cost-efficient selection of NoC topologies for general purpose CMPs
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
A fast, source-synchronous ring-based network-on-chip design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
An energy-aware online task mapping algorithm in NoC-based system
The Journal of Supercomputing
McRouter: multicast within a router for high performance network-on-chips
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
An analytical model for on-chip interconnects in multimedia embedded systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture
Journal of Parallel and Distributed Computing
Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes
Microprocessors & Microsystems
X-Network: An area-efficient and high-performance on-chip wormhole interconnect network
Microprocessors & Microsystems
Hi-index | 0.00 |
With the number of cores of chip multiprocessors (CMPs) rapidly growing as technology scales down, connecting the different components of a CMP in a scalable and efficient way becomes increasingly challenging. In this article, we explore the architectural-level implications of interconnection network design for CMPs with up to 128 fine-grain multithreaded cores. We evaluate and compare different network topologies using accurate simulation of the full chip, including the memory hierarchy and interconnect, and using a diverse set of scientific and engineering workloads. We find that the interconnect has a large impact on performance, as it is responsible for 60% to 75% of the miss latency. Latency, and not bandwidth, is the primary performance constraint, since, even with many threads per core and workloads with high miss rates, networks with enough bandwidth can be efficiently implemented for the system scales we consider. From the topologies we study, the flattened butterfly consistently outperforms the mesh and fat tree on all workloads, leading to performance advantages of up to 22%. We also show that considering interconnect and memory hierarchy together when designing large-scale CMPs is crucial, and neglecting either of the two can lead to incorrect conclusions. Finally, the effect of the interconnect on overall performance becomes more important as the number of cores increases, making interconnection choices especially critical when scaling up.