Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Transputers-Past, Present and Future
IEEE Micro
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A 1.5GHz third generation itanium® 2 processor
Proceedings of the 40th annual Design Automation Conference
Architecture Design of Reconfigurable Pipelined Datapaths
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
VLSI Design and Verification of the Imagine Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Energy characterization of a tiled architecture processor with on-chip networks
Proceedings of the 2003 international symposium on Low power electronics and design
Globally-asynchronous locally-synchronous systems (performance, reliability, digital)
Globally-asynchronous locally-synchronous systems (performance, reliability, digital)
Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor
Proceedings of the 31st annual international symposium on Computer architecture
Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multi-Processor Systems
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
Tile size selection for low-power tile-based architectures
Proceedings of the 3rd conference on Computing frontiers
Wavefront Array Processor: Language, Architecture, and Applications
IEEE Transactions on Computers
Computer
A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.01 |
This paper presents the architecture of an asynchronous array of simple processors (AsAP), and evaluates its key architectural features as well as its performance and energy efficiency. The AsAP processor calculates DSP applications with high energy-efficiency, is capable of high-performance, is easily scalable, and is well-suited to future fabrication technologies. It is composed of a two-dimensional array of simple single-issue programmable processors interconnected by a reconfigurable mesh network. Processors are designed to capture the kernels of many DSP algorithms with very little additional overhead. Each processor contains its own tunable and haltable clock oscillator, and processors operate completely asynchronously with respect to each other in a globally asynchronous locally synchronous (GALS) fashion. A 6脳6 AsAP array has been designed and fabricated in a 0.18 μm CMOS technology. Each processor occupies 0.66 mm2, is fully functional at a clock rate of 520---540 MHz at 1.8 V, and dissipates an average of 35 mW per processor at 520 MHz under typical conditions while executing applications such as a JPEG encoder core and a complete IEEE 802.11a/g wireless LAN baseband transmitter. Most processors operate at over 600 MHz at 2.0 V. Processors dissipate 2.4 mW at 116 MHz and 0.9 V. A single AsAP processor occupies 4% or less area than a single processing element in other multi-processor chips. Compared to several RISC processors (single issue MIPS and ARM), AsAP achieves performance 27---275 times greater, energy efficiency 96---215 times greater, while using far less area. Compared to the TI C62x high-end DSP processor, AsAP achieves performance 0.8---9.6 times greater, energy efficiency 10---75 times greater, with an area 7---19 times smaller. Compared to ASIC implementations, AsAP achieves performance within a factor of 2---5, energy efficiency within a factor of 3---50, with area within a factor of 2.5---3. These data are for varying numbers of AsAP processors per benchmark.