Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing

  • Authors:
  • Aydin O. Balkan;Michael N. Horak;Gang Qu;Uzi Vishkin

  • Affiliations:
  • UMD;UMD;UMD;UMD

  • Venue:
  • HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
  • Year:
  • 2007

Quantified Score

Hi-index 0.02

Visualization

Abstract

A Mesh of Trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput be- tween memory units and processors for single-chip paral- lel processing [5]. In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate verilog simulations to verify the analytical results claimed in [5]. We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitra- tion primitives to handle load and store, the two most com- mon memory operations. We also study the use of pipeline registers in large networks when there are long wires. Sim- ulation based on full network layout demonstrates that sig- nificant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of per- formance features of the MoT interconnection network, as they were previously shown to be competitive with tradi- tional network solutions. The MoT network is currently used in an eXplicitMulti-Threading (XMT) on-chip parallel processor, which is engineered to support parallel program- ming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication.