A case for globally shared-medium on-chip interconnect

Authors:
Aaron Carpenter;Jianyun Hu;Jie Xu;Michael Huang;Hui Wu
Affiliations:
University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA
Venue:
Proceedings of the 38th annual international symposium on Computer architecture
Year:
2011

Citing 36
Cited 5

Let's route packets instead of wires

AUSCRYPT '90 Proceedings of the sixth MIT conference on Advanced research in VLSI
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
SP2 system architecture

IBM Systems Journal
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
The Alpha 21364 Network Architecture

IEEE Micro
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The impact of shared-cache clustering in small-scale shared-memory multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
TLC: Transmission Line Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ALOHA packet system with and without slots and capture

ACM SIGCOMM Computer Communication Review
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Characterizing the Cell EIB On-Chip Network

IEEE Micro
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
A 5-GHz Mesh Interconnect for a Teraflops Processor

IEEE Micro
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
RF interconnects for communications on-chip

Proceedings of the 2008 international symposium on Physical design
Corona: System Implications of Emerging Nanophotonic Technology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches

IEEE Micro
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Power reduction of CMP communication networks via RF-interconnects

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Low-cost router microarchitecture for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
An intra-chip free-space optical interconnect

Proceedings of the 37th annual international symposium on Computer architecture
Silicon Nanophotonic Network-on-Chip Using TDM Arbitration

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
Efficient data streaming with on-chip accelerators: Opportunities and challenges

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
A design space exploration of transmission-line links for on-chip interconnect

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Enhancing effective throughput for transmission line-based bus

Proceedings of the 39th Annual International Symposium on Computer Architecture
An on-chip global broadcast network design with equalized transmission lines in the 1024-core era

Proceedings of the International Workshop on System Level Interconnect Prediction
Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

As microprocessor chips integrate a growing number of cores, the issue of interconnection becomes more important for overall system performance and efficiency. Compared to traditional distributed shared-memory architecture, chip-multiprocessors offer a different set of design constraints and opportunities. As a result, a conventional packet-relay multiprocessor interconnect architecture is a valid, but not necessarily optimal, design point. For example, the advantage of off-the-shelf interconnect and the in-field scalability of the interconnect are less important in a chip-multiprocessor. On the other hand, even with worsening wire delays,packet switching represents a non-trivial component of overall latency. In this paper, we show that with straight forward optimizations, the traffic between different cores can be kept relatively low. This in turn allows simple shared-medium interconnects to be built using communication circuits driving transmission lines. This architecture offers extremely low latencies and can support a large number of cores without the need for packet switching, eliminating costly routers.