Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
High-level transformations for minimizing syntactic variances
DAC '93 Proceedings of the 30th international Design Automation Conference
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Managing pipeline-reconfigurable FPGAs
FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
High-performance carry chains for FPGAs
FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
A video signal processor for MIMD multiprocessing
DAC '98 Proceedings of the 35th annual Design Automation Conference
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ECL: a specification environment for system-level design
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Automatic test pattern generation for functional RTL circuits using assignment decision diagrams
Proceedings of the 37th Annual Design Automation Conference
The Garp Architecture and C Compiler
Computer
IEEE Design & Test
Imagine: Media Processing with Streams
IEEE Micro
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations
Power aware computing
Architecture Design of Reconfigurable Pipelined Datapaths
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Incremental reconfiguration for pipelined applications
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
RVC - A Reconfigurable Coprocessor for Vector Processing Applications
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Specifying and Compiling Applications for RaPiD
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Media Processing Applications on the Imagine Stream Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Behavioral Synthesis of Data-Dominated Circuits for Minimal Energy Implementation
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Overview of a compiler for synthesizing MATLAB programs onto FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
On the sphere-decoding algorithm I. Expected complexity
IEEE Transactions on Signal Processing - Part I
Scalar coprocessors for accelerating the G723.1 and G729A speech coders
IEEE Transactions on Consumer Electronics
Using global code motions to improve the quality of results for high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Boundary macroblock padding in MPEG-4 video decoding using a graphics coprocessor
IEEE Transactions on Circuits and Systems for Video Technology
An automated, reconfigurable, low-power RFID tag
Proceedings of the 43rd annual Design Automation Conference
Reducing power while increasing performance with supercisc
ACM Transactions on Embedded Computing Systems (TECS)
An automated, FPGA-based reconfigurable, low-power RFID tag
Microprocessors & Microsystems
Radio frequency identification prototyping
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing near-ML MIMO detector for SDR baseband on parallel programmable architectures
Proceedings of the conference on Design, automation and test in Europe
A design automation and power estimation flow for RFID systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Interconnect customization for a hardware fabric
ACM Transactions on Design Automation of Electronic Systems (TODAES)
VLSI architecture design approaches for real-time video processing
WSEAS Transactions on Circuits and Systems
A low-power CMOS thyristor based delay element with programmability extensions
Proceedings of the 19th ACM Great Lakes symposium on VLSI
ICC'08 Proceedings of the 12th WSEAS international conference on Circuits
Design space exploration for low-power reconfigurable fabrics
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
This paper presents an architecture that combines VLIW (very long instruction word) processing with the capability to introduce application-specific customized instructions and highly parallel combinational hardware functions for the acceleration of signal processing applications. To support this architecture, a compilation and design automation flow is described for algorithms written in C. The key contributions of this paper are as follows: (1) a 4-way VLIW processor implemented in an FPGA, (2) large speedups through hardware functions, (3) a hardware/software interface with zero overhead, (4) a design methodology for implementing signal processing applications on this architecture, (5) tractable design automation techniques for extracting and synthesizing hardware functions. Several design tradeoffs for the architecture were examined including the number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply-accumulate operations. Using the MediaBench benchmark suite, we tested our methodology and architecture to accelerate software. Our combined VLIW processor with hardware functions was compared to that of software executing on a RISC processor, specifically the soft core embedded NIOS II processor. For software kernels converted into hardware functions, we show a hardware performance multiplier of up to 230 times that of software with an average 63 times faster. For the entire application in which only a portion of the software is converted to hardware, the performance improvement is as much as 30X times faster than the nonaccelerated application, with a 12X improvement on average.