Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing

Authors:
Aydin O. Balkan;Michael N. Horak;Gang Qu;Uzi Vishkin
Affiliations:
UMD;UMD;UMD;UMD
Venue:
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Year:
2007

Citing 0
Cited 11

Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing

Proceedings of the 45th annual Design Automation Conference
A pilot study to compare programming effort for two parallel programming models

Journal of Systems and Software
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Algorithmic approach to designing an easy-to-program system: Can it lead to a HW-enhanced programmer's workflow add-on?

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Using simple abstraction to reinvent computing for parallelism

Communications of the ACM
A low-latency adaptive asynchronous interconnection network using bi-modal router nodes

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
OTIS-MOT: an efficient interconnection network for parallel processing

The Journal of Supercomputing
Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Thermal management of a many-core processor under fine-grained parallelism

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing

Quantified Score

Hi-index	0.02

Visualization

Abstract

A Mesh of Trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput be- tween memory units and processors for single-chip paral- lel processing [5]. In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate verilog simulations to verify the analytical results claimed in [5]. We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitra- tion primitives to handle load and store, the two most com- mon memory operations. We also study the use of pipeline registers in large networks when there are long wires. Sim- ulation based on full network layout demonstrates that sig- nificant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of per- formance features of the MoT interconnection network, as they were previously shown to be competitive with tradi- tional network solutions. The MoT network is currently used in an eXplicitMulti-Threading (XMT) on-chip parallel processor, which is engineered to support parallel program- ming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication.