Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Authors:
Rakesh Kumar;Victor Zyuban;Dean M. Tullsen
Affiliations:
University of California at San Diego;IBM TJ Watson Research Center;University of California at San Diego
Venue:
Proceedings of the 32nd annual international symposium on Computer Architecture
Year:
2005

Citing 18
Cited 90

The cosmic cube

Communications of the ACM - Special section on computer architecture
Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
Hierarchical cache/bus architecture for shared memory multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Synchronization, Coherence, and Event Ordering in Multiprocessors

Computer
The Stanford Dash Multiprocessor

Computer
Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories

25 years of the international symposia on Computer architecture (selected papers)
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Single-Chip Multiprocessor

Computer
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
Flow control and micro-architectural mechanisms for extending the performance of interconnection networks

Flow control and micro-architectural mechanisms for extending the performance of interconnection networks
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design and implementation of the POWER5™ microprocessor

Proceedings of the 41st annual Design Automation Conference
The future of interconnection technology

IBM Journal of Research and Development
A performance methodology for commercial servers

IBM Journal of Research and Development
The circuit and physical design of the POWER4 microprocessor

IBM Journal of Research and Development

Heterogeneous Chip Multiprocessors

Computer
Exploring the cache design space for large scale CMPs

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
PowerViP: Soc power estimation framework at transaction level

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

Proceedings of the 33rd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Efficiently exploring architectural design spaces via predictive modeling

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Design space exploration for multicore architectures: a power/performance/thermal view

Proceedings of the 20th annual international conference on Supercomputing
Support for High-Frequency Streaming in CMPs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Modelling and simulation of off-chip communication architectures for high-speed packet processors

Journal of Systems and Software
The psi-cube: a bus-based cube-type clustering network for high-performance on-chip systems

Parallel Computing
Physical aware frequency selection for dynamic thermal management in multi-core systems

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Conjoining soft-core FPGA processors

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
CMP cache performance projection: accessibility vs. capacity

ACM SIGARCH Computer Architecture News
Proximity-aware directory-based coherence for multi-core processor architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Rotary router: an efficient architecture for CMP interconnection networks

Proceedings of the 34th annual international symposium on Computer architecture
A novel dimensionally-decomposed router for on-chip communication in 3D architectures

Proceedings of the 34th annual international symposium on Computer architecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
On the Design of a Photonic Network-on-Chip

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
Efficient architectural design space exploration via predictive modeling

ACM Transactions on Architecture and Code Optimization (TACO)
VEBoC: variation and error-aware design for billions of devices on a chip

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Embedded processors and systems: Architectural issues and solutions for emerging applications

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
A consistency architecture for hierarchical shared caches

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Corona: System Implications of Emerging Nanophotonic Technology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Thermal monitoring mechanisms for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures

The Journal of Supercomputing
On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures

Microprocessors & Microsystems
Application-specific Processor Architecture: Then and Now

Journal of Signal Processing Systems
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Computation and data transfer co-scheduling for interconnection bus minimization

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Tradeoffs in designing accelerator architectures for visual computing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Power reduction of CMP communication networks via RF-interconnects

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Evaluating SoC Network Performance in MPEG-4 Encoder

Journal of Signal Processing Systems
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Synchronizing redundant cores in a dynamic DMR multicore architecture

IEEE Transactions on Circuits and Systems II: Express Briefs
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing signal timing variations in inter-core busses

Integration, the VLSI Journal
Efficient message management in tiled CMP architectures using a heterogeneous interconnection network

HiPC'07 Proceedings of the 14th international conference on High performance computing
Constraint-aware large-scale CMP cache design

HiPC'07 Proceedings of the 14th international conference on High performance computing
The SKB: a semi-completely-connected bus for on-chip systems

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Scalable multi-cores with improved per-core performance using off-the-critical path reconfigurable hardware

HiPC'08 Proceedings of the 15th international conference on High performance computing
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems

Proceedings of the 37th annual international symposium on Computer architecture
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Proceedings of the 37th annual international symposium on Computer architecture
Improving the Performance of GALS-Based NoCs in the Presence of Process Variation

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A 128 x 128 x 24Gb/s Crossbar Interconnecting 128 Tiles in a Single Hop and Occupying 6% of Their Area

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Branch target buffer design for embedded processors

Microprocessors & Microsystems
Cost-driven 3D integration with interconnect layers

Proceedings of the 47th Design Automation Conference
Low power branch prediction for embedded application processors

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Journal of Systems Architecture: the EUROMICRO Journal
A methodology for the characterization of process variation in NoC links

Proceedings of the Conference on Design, Automation and Test in Europe
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Design of a performance enhanced and power reduced dual-crossbar Network-on-Chip (NoC) architecture

Microprocessors & Microsystems
Algorithms for optimally arranging multicore memory structures

EURASIP Journal on Embedded Systems
A workload-adaptive and reconfigurable bus architecture for multicore processors

International Journal of Reconfigurable Computing
Characterizing the impact of process variation on 45 nm NoC-based CMPs

Journal of Parallel and Distributed Computing
Large-scale integrated photonics for high-performance interconnects

ACM Journal on Emerging Technologies in Computing Systems (JETC)
A case for globally shared-medium on-chip interconnect

Proceedings of the 38th annual international symposium on Computer architecture
L2-Cache hierarchical organizations for multi-core architectures

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
A high efficient on-chip interconnection network in SIMD CMPs

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Packet chaining: efficient single-cycle allocation for on-chip networks

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Low-Overhead, high-speed multi-core barrier synchronization

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Using partial tag comparison in low-power snoop-based chip multiprocessors

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Proceedings of the 26th ACM international conference on Supercomputing
Adaptive dynamic frequency scaling for thermal-aware 3d multi-core processors

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Energy-guided exploration of on-chip network design for exa-scale computing

Proceedings of the International Workshop on System Level Interconnect Prediction
Data transfers on the fly for hierarchical systems of chip multi-processors

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
Characterization and cost-efficient selection of NoC topologies for general purpose CMPs

Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
New heuristic algorithms for low-energy mapping and routing in 3D NoC

International Journal of Computer Applications in Technology
Dynamic cache management in multi-core architectures through run-time adaptation

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hardware support for accurate per-task energy metering in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. This research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi-core design. Several examples are presented showing the need for careful co-design. For instance, increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes, and does not necessarily increase performance. Also, shared level-2 caches become significantly less attractive when the overhead of the resulting crossbar is accounted for. A hierarchical bus structure is examined which negates some of the performance costs of the assumed base-line architecture.