Mesh-of-trees and alternative interconnection networks for single-chip parallelism

Authors:
Aydin O. Balkan;Gang Qu;Uzi Vishkin
Affiliations:
Electrical and Computer Engineering Department and Institute of Advanced Computer Studies, University of Maryland, College Park, MD;Electrical and Computer Engineering Department and Institute of Advanced Computer Studies, University of Maryland, College Park, MD;Electrical and Computer Engineering Department and Institute of Advanced Computer Studies, University of Maryland, College Park, MD
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2009

Citing 39
Cited 2

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
An O(log N) deterministic packet-routing scheme

Journal of the ACM (JACM)
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A methodology for correct-by-construction latency insensitive design

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Compact, multilayer layout for butterfly fat-tree

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
On the area of hypercube layouts

Information Processing Letters
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Banyan networks for partitioning multiprocessor systems

ISCA '73 Proceedings of the 1st annual symposium on Computer architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A complexity theory for VLSI

A complexity theory for VLSI
Building the 4 Processor SB-PRAM Prototype

HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Advanced Technology Track - Volume 5
Structured interconnect architecture: a solution for the non-scalability of bus-based SoCs

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Evaluation of MP-SoC Interconnect Architectures: a Case Study

IWSOC '04 Proceedings of the System-on-Chip for Real-Time Applications, 4th IEEE International Workshop
Design of FPGA interconnect for multilevel metallization

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing

HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors

IEEE Transactions on Computers
On-Chip Interconnection Networks of the TRIPS Chip

IEEE Micro
Bringing NoCs to 65 nm

IEEE Micro
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Design issues in next-generation merchant switch fabrics

IEEE/ACM Transactions on Networking (TON)
New lower bound techniques for VLSI

SFCS '81 Proceedings of the 22nd Annual Symposium on Foundations of Computer Science
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing

Proceedings of the 45th annual Design Automation Conference
Synthesis of predictable networks-on-chip-based interconnect architectures for chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hardware design, prototyping and studies of the explicit multi-threading (xmt) paradigm

Hardware design, prototyping and studies of the explicit multi-threading (xmt) paradigm
Mesh-of-trees interconnection network for an explicitly multi-threaded parallel computer architecture

Mesh-of-trees interconnection network for an explicitly multi-threaded parallel computer architecture
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Layout-Aware Analysis of Networks-on-Chip and Traditional Interconnects for MPSoCs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An Application-Specific Design Methodology for On-Chip Crossbar Generation

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
BMSYN: Bus Matrix Communication Architecture Synthesis for MPSoC

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

OTIS-MOT: an efficient interconnection network for parallel processing

The Journal of Supercomputing
On two-layer brain-inspired hierarchical topologies – a rent's rule approach –

Transactions on High-Performance Embedded Architectures and Compilers IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

In single-chip parallel processors, it is crucial to implement a high-throughput low-latency interconnection network to connect the on-chip components, especially the processing units and the memory units. In this paper, we propose a new mesh of trees (MoT) implementation of the interconnection network and evaluate it relative to metrics such as wire complexity, total register count, single switch delay, maximum throughput, tradeoffs between throughput and latency, and post-layout performance. We show that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that. MoT is also compared, both analytically and experimentally, to some other traditional network topologies, such as hypercube, butterfly, fat trees and butterfly fat trees. When we evaluate a 64-terminal MoT network at 90-nm technology, concrete results show that MoT provides higher throughput and lower latency especially when the input traffic (or the on-chip parallelism) is high, at comparable area. A recurring problem in networking and communication is that of achieving good sustained throughput in contrast to just high theoretical peak performance that does not materialize for typical work loads. Our quantitative results demonstrate a clear advantage of the proposed MoT network in the context of single-chip parallel processing.