Communications of the ACM
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Translating concurrent communicating programs into asynchronous circuits
Translating concurrent communicating programs into asynchronous circuits
A family of routing and communication chips based on the Mosaic
Proceedings of the 1993 symposium on Research on integrated systems
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Sequential Optimization of Asynchronous and Synchronous Finite-State Machines: Algorithms and Tools
Sequential Optimization of Asynchronous and Synchronous Finite-State Machines: Algorithms and Tools
Globally-asynchronous locally-synchronous systems (performance, reliability, digital)
Globally-asynchronous locally-synchronous systems (performance, reliability, digital)
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Robust interfaces for mixed-timing systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An Asynchronous NOC Architecture Providing Low Latency Service and Its Multi-Level Design Framework
ASYNC '05 Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Key research problems in NoC design: a holistic perspective
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
A Survey and Taxonomy of GALS Design Styles
IEEE Design & Test
A GALS Infrastructure for a Massively Parallel Multiprocessor
IEEE Design & Test
MOUSETRAP: high-speed transition-signaling asynchronous pipelines
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
Practical asynchronous interconnect network design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multisynchronous and Fully Asynchronous NoCs for GALS Architectures
IEEE Design & Test
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
Analysis and optimization for pipelined asynchronous systems
Analysis and optimization for pipelined asynchronous systems
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
Link pipelining strategies for an application-specific asynchronous NoC
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
A low-latency adaptive asynchronous interconnection network using bi-modal router nodes
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the 4th International Workshop on Network on Chip Architectures
A source-synchronous Htree-based network-on-chip
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Exploring topologies for source-synchronous ring-based network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.02 |
A new asynchronous interconnection network is introduced for globally-asynchronous locally-synchronous (GALS)chip multiprocessors. The network eliminates the need for global clock distribution, and can interface multiple synchronous timing domains operating at unrelated clock rates.In particular, two new highly-concurrent asynchronous components are introduced which provide simple routing and arbitration/merge functions.Post-layout simulations in identical commercial 90nm technology indicate that comparable recent synchronous router nodes have 5.6-10.7x more energy per packet and 2.8-6.4x greater area than the new asynchronous nodes.Under random traffic, the network provides significantly lower latency and competitive throughput over the entire operating range of the 800 MHz network and through mid-range traffic rates for the 1.36 GHz network, but with degradation at higher traffic rates. Preliminary evaluations are also presented for a mixed-timing (GALS) network in a shared-memory parallel architecture, running both random traffic and parallel benchmark kernels, as well as directions for further improvement.