A video signal processor for MIMD multiprocessing
DAC '98 Proceedings of the 35th annual Design Automation Conference
The Garp Architecture and C Compiler
Computer
Imagine: Media Processing with Streams
IEEE Micro
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations
Power aware computing
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
RVC - A Reconfigurable Coprocessor for Vector Processing Applications
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A Video Compression Case Study on a Reconfigurable VLIW Architecture
Proceedings of the conference on Design, automation and test in Europe
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Behavioral Synthesis of Data-Dominated Circuits for Minimal Energy Implementation
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Scalar coprocessors for accelerating the G723.1 and G729A speech coders
IEEE Transactions on Consumer Electronics
Using global code motions to improve the quality of results for high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Boundary macroblock padding in MPEG-4 video decoding using a graphics coprocessor
IEEE Transactions on Circuits and Systems for Video Technology
An automated, reconfigurable, low-power RFID tag
Proceedings of the 43rd annual Design Automation Conference
ACM Transactions on Embedded Computing Systems (TECS)
Reducing power while increasing performance with supercisc
ACM Transactions on Embedded Computing Systems (TECS)
An automated, FPGA-based reconfigurable, low-power RFID tag
Microprocessors & Microsystems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Radio frequency identification prototyping
ACM Transactions on Design Automation of Electronic Systems (TODAES)
VESPA: portable, scalable, and flexible FPGA-based vector processors
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A design automation and power estimation flow for RFID systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Interconnect customization for a hardware fabric
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A low-power CMOS thyristor based delay element with programmability extensions
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Vector Processing as a Soft Processor Accelerator
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fine-grain performance scaling of soft vector processors
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient multi-ported memories for FPGAs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Customizing the datapath and ISA of soft VLIW processors
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A configurable multi-ported register file architecture for soft processor cores
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Dynamically reconfigurable register file for a softcore VLIW processor
Proceedings of the Conference on Design, Automation and Test in Europe
Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
FPGA implementation of variable-precision floating-point arithmetic
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Making wide-issue VLIW processors viable on FPGAs
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Multi-ported memories for FPGAs via XOR
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
OCTAVO: an FPGA-centric processor family
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
A run-time task migration scheme for an adjustable issue-slots multi-core processor
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Portable, flexible, and scalable soft vector processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A new SBST algorithm for testing the register file of VLIW processors
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.