An FPGA-based VLIW processor with custom hardware execution

Authors:
Alex K. Jones;Raymond Hoare;Dara Kusic;Joshua Fazekas;John Foster
Affiliations:
University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA
Venue:
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Year:
2005

Citing 16
Cited 24

A video signal processor for MIMD multiprocessing

DAC '98 Proceedings of the 35th annual Design Automation Conference
The Garp Architecture and C Compiler

Computer
PipeRench: A Reconfigurable Architecture and Compiler

Computer
Imagine: Media Processing with Streams

IEEE Micro
RaPiD - Reconfigurable Pipelined Datapath

FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
Profiling tools for hardware/software partitioning of embedded applications

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations

Power aware computing
The Chimaera reconfigurable functional unit

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
RVC - A Reconfigurable Coprocessor for Vector Processing Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A Video Compression Case Study on a Reconfigurable VLIW Architecture

Proceedings of the conference on Design, automation and test in Europe
Efficient Application Representation for HASTE: Hybrid Architectures with a Single, Transformable Executable

FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Behavioral Synthesis of Data-Dominated Circuits for Minimal Energy Implementation

VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Scalar coprocessors for accelerating the G723.1 and G729A speech coders

IEEE Transactions on Consumer Electronics
Using global code motions to improve the quality of results for high-level synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Boundary macroblock padding in MPEG-4 video decoding using a graphics coprocessor

IEEE Transactions on Circuits and Systems for Video Technology

An automated, reconfigurable, low-power RFID tag

Proceedings of the 43rd annual Design Automation Conference
Design space exploration using arithmetic-level hardware--software cosimulation for configurable multiprocessor platforms

ACM Transactions on Embedded Computing Systems (TECS)
Reducing power while increasing performance with supercisc

ACM Transactions on Embedded Computing Systems (TECS)
An automated, FPGA-based reconfigurable, low-power RFID tag

Microprocessors & Microsystems
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Radio frequency identification prototyping

ACM Transactions on Design Automation of Electronic Systems (TODAES)
VESPA: portable, scalable, and flexible FPGA-based vector processors

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A design automation and power estimation flow for RFID systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Interconnect customization for a hardware fabric

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A low-power CMOS thyristor based delay element with programmability extensions

Proceedings of the 19th ACM Great Lakes symposium on VLSI
Vector Processing as a Soft Processor Accelerator

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fine-grain performance scaling of soft vector processors

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient multi-ported memories for FPGAs

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Customizing the datapath and ISA of soft VLIW processors

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A configurable multi-ported register file architecture for soft processor cores

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Dynamically reconfigurable register file for a softcore VLIW processor

Proceedings of the Conference on Design, Automation and Test in Europe
Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
FPGA implementation of variable-precision floating-point arithmetic

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Making wide-issue VLIW processors viable on FPGAs

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Multi-ported memories for FPGAs via XOR

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
OCTAVO: an FPGA-centric processor family

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
A run-time task migration scheme for an adjustable issue-slots multi-core processor

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Portable, flexible, and scalable soft vector processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A new SBST algorithm for testing the register file of VLIW processors

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.