Reducing power in high-performance microprocessors
DAC '98 Proceedings of the 35th annual Design Automation Conference
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture
ASYNC '06 Proceedings of the 12th IEEE International Symposium on Asynchronous Circuits and Systems
Dynamic Power Management by Combination of Dual Static Supply Voltages
ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook
IEEE Design & Test
A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
From SODA to scotch: The evolution of a wireless baseband processor
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Design space exploration of a mesochronous link for cost-effective and flexible GALS NOCs
Proceedings of the Conference on Design, Automation and Test in Europe
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Power-aware dynamic memory management on many-core platforms utilizing DVFS
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Hi-index | 0.00 |
This paper presents a many-core heterogeneous computational platform that employs a GALS compatible circuit-switched on-chip network. The platform targets streaming DSP and embedded applications that have a high degree of task-level parallelism among computational kernels. The test chip was fabricated in 65nm CMOS consisting of 164 simple small programmable cores, three dedicated-purpose accelerators and three shared memory modules. All processors are clocked by their own local oscillators and communication is achieved through a simple yet effective source-synchronous communication technique that allows each interconnection link between any two processors to sustain a peak throughput of one data word per cycle. A complete 802.11a WLAN baseband receiver was implemented on this platform. It has a real-time throughput of 54 Mbps with all processors running at 594 MHz and 0.95 V, and consumes an average 174.76 mW with 12.18 mW (or 7.0%) dissipated by its interconnection links. We can fully utilize the benefit of the GALS architecture and by adjusting each processor's oscillator to run at a workload-based optimal clock frequency with the chip's dual supply voltages set at 0.95 V and 0.75 V, the receiver consumes only 123.18 mW, a 29.5% in power reduction. Measured results of its power consumption on the real chip come within the difference of only 2-5% compared with the estimated results showing our design to be highly reliable and efficient.