Reducing power in high-performance microprocessors
DAC '98 Proceedings of the 35th annual Design Automation Conference
A Complete Strategy for Testing an On-Chip Multiprocessor Architecture
IEEE Design & Test
Pausible Clocking: A First Step Toward Heterogeneous Systems
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
A Low-Latency FIFO for Mixed-Clock Systems
WVLSI '00 Proceedings of the IEEE Computer Society Annual Workshop on VLSI (WVLSI'00)
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture
ASYNC '06 Proceedings of the 12th IEEE International Symposium on Asynchronous Circuits and Systems
GALS networks on chip: a new solution for asynchronous delay-insensitive links
Proceedings of the conference on Design, automation and test in Europe: Designers' forum
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Demystifying Data-Driven and Pausible Clocking Schemes
ASYNC '07 Proceedings of the 13th IEEE International Symposium on Asynchronous Circuits and Systems
Dynamic Power Management by Combination of Dual Static Supply Voltages
ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook
IEEE Design & Test
A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Estimating reliability and throughput of source-synchronous wave-pipelined interconnect
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Principles of Asynchronous Circuit Design: A Systems Perspective
Principles of Asynchronous Circuit Design: A Systems Perspective
A source-synchronous Htree-based network-on-chip
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Exploring topologies for source-synchronous ring-based network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
A fast, source-synchronous ring-based network-on-chip design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
This paper presents a globally-asynchronous locally-synchronous (GALS)-compatible circuit-switched on-chip network that is well suited for use in many-core platforms targeting streaming digital signal processing and embedded applications which typically have a high degree of task-level parallelism among computational kernels. Inter-processor communication is achieved through a simple yet effective reconfigurable sourcesynchronous network. Interconnect paths between processors can sustain a peak throughput of one word per cycle. A theoretical model is developed for analyzing the performance of the network. A 65 nm complementary metal-oxide-semiconductor GALS chip utilizing this network was fabricated which contains 164 programmable processors, three accelerators and three shared memory modules. For evaluating the efficiency of this platform, a complete 802.11a wireless local area network baseband receiver was implemented. It has a real-time throughput of 54 Mb/s with all processors running at 594 MHz and 0.95-V, and consumes an average of 174.8 mW with 12.2 mW (or 7.0%) dissipated by its interconnect links and switches. With the chip's dual supply voltages set at 0.95-V and 0.75-V, and individual processors' oscillators operating at workload-based optimal frequencies, the receiver consumes 123.2 mW, which is a 29.5% reduction in power. Measured power consumption values from the chip are within 2-5% of the estimated values.