Let's route packets instead of wires
AUSCRYPT '90 Proceedings of the sixth MIT conference on Advanced research in VLSI
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
IBM Systems Journal
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
The Alpha 21364 Network Architecture
IEEE Micro
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The impact of shared-cache clustering in small-scale shared-memory multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ALOHA packet system with and without slots and capture
ACM SIGCOMM Computer Communication Review
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
Characterizing the Cell EIB On-Chip Network
IEEE Micro
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
RF interconnects for communications on-chip
Proceedings of the 2008 international symposium on Physical design
Corona: System Implications of Emerging Nanophotonic Technology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Power reduction of CMP communication networks via RF-interconnects
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
An intra-chip free-space optical interconnect
Proceedings of the 37th annual international symposium on Computer architecture
Silicon Nanophotonic Network-on-Chip Using TDM Arbitration
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
TLSync: support for multiple fast barriers using on-chip transmission lines
Proceedings of the 38th annual international symposium on Computer architecture
Efficient data streaming with on-chip accelerators: Opportunities and challenges
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
TLSync: support for multiple fast barriers using on-chip transmission lines
Proceedings of the 38th annual international symposium on Computer architecture
A design space exploration of transmission-line links for on-chip interconnect
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Enhancing effective throughput for transmission line-based bus
Proceedings of the 39th Annual International Symposium on Computer Architecture
An on-chip global broadcast network design with equalized transmission lines in the 1024-core era
Proceedings of the International Workshop on System Level Interconnect Prediction
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
As microprocessor chips integrate a growing number of cores, the issue of interconnection becomes more important for overall system performance and efficiency. Compared to traditional distributed shared-memory architecture, chip-multiprocessors offer a different set of design constraints and opportunities. As a result, a conventional packet-relay multiprocessor interconnect architecture is a valid, but not necessarily optimal, design point. For example, the advantage of off-the-shelf interconnect and the in-field scalability of the interconnect are less important in a chip-multiprocessor. On the other hand, even with worsening wire delays,packet switching represents a non-trivial component of overall latency. In this paper, we show that with straight forward optimizations, the traffic between different cores can be kept relatively low. This in turn allows simple shared-medium interconnects to be built using communication circuits driving transmission lines. This architecture offers extremely low latencies and can support a large number of cores without the need for packet switching, eliminating costly routers.