The Garp Architecture and C Compiler
Computer
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations
Power aware computing
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Boundary macroblock padding in MPEG-4 video decoding using a graphics coprocessor
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Scientific computing and multimedia applications frequently call loop-intensive functions that dominate execution time. Applying homogeneous, parallel processors (e.g. single-instruction, multiple-data (SIMD) and very-long instruction word (VLIW)) is a common approach to minimizing execution time. However, many benchmark applications offer disappointingdegrees of instruction-level parallelism (ILP) that cause these ar-chitectures to fall short of expected performance gains. This paper presents findings on execution time speedup achieved by heterogeneousmassively parallel processors - standard reduced instruction-set comput-ing (RISC) CPUstightly coupled with arrays of super-complex instruction-set computing (SuperCISC) datapaths on the same chip. SuperCISC datapaths are created by mapping frequently-called functions into reconfigurable hardware. Encouraging performance results from the RISC/SuperCISC architecture point to the efficiency of reconfigurable devices to support large numbers of parallel computational accelerators. Calls to SuperCISC functions can greatly expedite execution time when applied to CPUs that support extensible in-struction sets. In this paper we show how SuperCISC functions can accelerate an application up to 25x over a 4-way VLIW. SuperCISC functions show superlinear speedup, a per-formance gain significantly greater than the software's ILP. SuperCISC functions also benefit from cycle com-pression, or a reduction of the idle cycle time for an operation to execute within a traditional CPU. Imple-menting software controls, or if-then-else statements, as hardware multiplexers within a SuperCISC function further advances performance.