CoRAM: an in-fabric memory architecture for FPGA-based computing
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Prototype and evaluation of the CoRAM memory architecture for FPGA-based computing
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Hi-index | 0.00 |
When developing applications to run on FPGAs, we tend to expend great effort on crafting the custom hardware acceleration datapath---but blindly turn to the FPGA vendor tool/library to provide default solutions for on-chip interconnect and external interfaces. This often leads to ineffective communication- or memory-bound implementations since the design and tuning of the default general-purpose solutions necessarily makes design compromises for generality. This is counterproductive as the FPGA's flexible reconfigurability should afford us great opportunities for performance gain and cost reduction through extensive application-specific customization of the interconnect and interface IPs. This work presents a compiler that generates custom interconnect topology and connectivity with appropriately scaled capacity to support an application's exact communication requirements at a minimized cost. More specifically, the compiler analyzes an application developed for the CoRAM abstraction [1,2] for its connectivity and bandwidth requirements between the hardware processing kernels and external DRAM banks. The result is an extremely fine-tuned custom-topology soft-logic network-on-chip interconnect, which is enabled by the CONNECT NoC framework [3]. We perform an extensive evaluation that benchmarks two applications against the standard CoRAM implementation flow that relies on a fixed generically-tuned general-purpose soft-logic network-on-chip. Our RTL-driven evaluation shows a large opportunity for area reduction and improved efficiency (up by 48%) without any impact on application performance.