Vector processing as a soft-core CPU accelerator
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Achieving programming model abstractions for reconfigurable computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Massively parallel programming models used as hardware description languages: the OpenCL case
Proceedings of the International Conference on Computer-Aided Design
An efficient hardware architecture from c program with memory access to hardware
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
VLSI Design - Special issue on VLSI Circuits, Systems, and Architectures for Advanced Image and Video Compression Standards
A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform
Journal of Signal Processing Systems
Portable, flexible, and scalable soft vector processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SWSL: software synthesis for network lookup
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Hi-index | 0.00 |
Methodologies for synthesis of stand-alone hardware modules from C/C++ based languages have been gaining adoption for embedded system design, as an essential means to stay ahead of increasing performance, complexity, and time-to-market demands. However, using C to generate stand-alone blocks does not allow for truly seamless unification of embedded software and hardware development flows. This paper describes a methodology for generating hardware accelerator modules that are tightly coupled with a soft RISC CPU, its tool chain, and its memory system. This coupling allows for several significant advancements: (1) a unified development environment with true pushbutton switching between original software and hardwareaccelerated implementations, (2) direct access to memory from the accelerator module, (3) full support for pointers and arrays, and (4) latency-aware pipelining of memory transactions. We also present results of our implementation, the C2H Compiler. Eight user test cases on common embedded applications show speedup factors of 13x-73x achieved in less than a few days.