Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
An O(log N) deterministic packet-routing scheme
Journal of the ACM (JACM)
ICS '90 Proceedings of the 4th international conference on Supercomputing
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A methodology for correct-by-construction latency insensitive design
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Compact, multilayer layout for butterfly fat-tree
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
On the area of hypercube layouts
Information Processing Letters
IEEE Transactions on Parallel and Distributed Systems
Banyan networks for partitioning multiprocessor systems
ISCA '73 Proceedings of the 1st annual symposium on Computer architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A complexity theory for VLSI
Building the 4 Processor SB-PRAM Prototype
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Advanced Technology Track - Volume 5
Structured interconnect architecture: a solution for the non-scalability of bus-based SoCs
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Evaluation of MP-SoC Interconnect Architectures: a Case Study
IWSOC '04 Proceedings of the System-on-Chip for Real-Time Applications, 4th IEEE International Workshop
Design of FPGA interconnect for multilevel metallization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors
IEEE Transactions on Computers
IEEE Micro
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Design issues in next-generation merchant switch fabrics
IEEE/ACM Transactions on Networking (TON)
New lower bound techniques for VLSI
SFCS '81 Proceedings of the 22nd Annual Symposium on Foundations of Computer Science
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
Synthesis of predictable networks-on-chip-based interconnect architectures for chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hardware design, prototyping and studies of the explicit multi-threading (xmt) paradigm
Hardware design, prototyping and studies of the explicit multi-threading (xmt) paradigm
Mesh-of-trees interconnection network for an explicitly multi-threaded parallel computer architecture
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Layout-Aware Analysis of Networks-on-Chip and Traditional Interconnects for MPSoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An Application-Specific Design Methodology for On-Chip Crossbar Generation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
BMSYN: Bus Matrix Communication Architecture Synthesis for MPSoC
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
OTIS-MOT: an efficient interconnection network for parallel processing
The Journal of Supercomputing
On two-layer brain-inspired hierarchical topologies – a rent's rule approach –
Transactions on High-Performance Embedded Architectures and Compilers IV
Hi-index | 0.00 |
In single-chip parallel processors, it is crucial to implement a high-throughput low-latency interconnection network to connect the on-chip components, especially the processing units and the memory units. In this paper, we propose a new mesh of trees (MoT) implementation of the interconnection network and evaluate it relative to metrics such as wire complexity, total register count, single switch delay, maximum throughput, tradeoffs between throughput and latency, and post-layout performance. We show that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that. MoT is also compared, both analytically and experimentally, to some other traditional network topologies, such as hypercube, butterfly, fat trees and butterfly fat trees. When we evaluate a 64-terminal MoT network at 90-nm technology, concrete results show that MoT provides higher throughput and lower latency especially when the input traffic (or the on-chip parallelism) is high, at comparable area. A recurring problem in networking and communication is that of achieving good sustained throughput in contrast to just high theoretical peak performance that does not materialize for typical work loads. Our quantitative results demonstrate a clear advantage of the proposed MoT network in the context of single-chip parallel processing.