Techniques for synthesizing binaries to an advanced register/memory structure
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Hardware/software partitioning of software binaries: a case study of H.264 decode
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Automatic extraction of function bodies from software binaries
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Application development on hybrid systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Visions for application development on hybrid computing systems
Parallel Computing
A Force-Directed Scheduling based architecture generation algorithm and design tool for FPGAs
Journal of Systems Architecture: the EUROMICRO Journal
Generation of control and data flow graphs from scheduled and pipelined assembly code
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Applications that require digital signal processing (DSP) functions are typically mapped onto general purpose DSP processors. With the introduction of advanced FPGA architectures with built-in DSP support, a new hardware alternative is available for DSP designers. By exploiting its inherent parallelism, it is expected that FPGAs can outperform DSP processors. However, the migration of assembly code to hardware is typically a very arduous process. This paper describes the process and considerations for automatically translating software assembly and binary codes targeted for general DSP processors into Register Transfer Level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. The Texas Instruments C6000 DSP processor architecture has been used as the DSP processor platform, and the Xilinx Virtex II as the target FPGA. Various optimizations are discussed, including loop unrolling, induction variable analysis, memory and register optimizations, scheduling and resource binding. Experimental results on resource usage and performance are shown for ten software binary benchmarks in the signal processing and image processing domains. Results show performance gains of 3-20x in terms of reductions in execution cycles and 1.3-5x in terms of reductions in execution times for the FPGA designs over that of the DSP processors in terms of reductions of execution cycles.