HAL: a multi-paradigm approach to automatic data path synthesis
DAC '86 Proceedings of the 23rd ACM/IEEE Design Automation Conference
Stream-Oriented FPGA Computing in the Streams-C High Level Language
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Sharing of SRAM tables among NPN-equivalent LUTs in SRAM-based FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Designing Modular Hardware Accelerators in C with ROCCC 2.0
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
On clustering for maximal regularity extraction
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Embedding-based placement of processing element networks on FPGAs for physical model simulation
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Exploration with upgradeable models using statistical methods for physical model emulation
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
Physical system models that consist of thousands of ordinary differential equations can be synthesized to field-programmable gate arrays (FPGAs) for highly-parallelized, real-time physical system emulation. Previous work introduced synthesis of custom networks of homogeneous processing elements, consisting of processing elements that are either all general differential equation solvers or are all custom solvers tailored to solve specific equations. However, a complex physical system model may contain different types of equations such that using only general solvers or only custom solvers does not provide all of the possible speedup. We introduce methods to synthesize a custom network of heterogeneous processing elements for emulating physical systems, where each element is either a general or custom differential equation solver. We show average speedups of 45x over a 3 GHz single-core desktop processor, and of 11x and 20x over a 3 GHz four-core desktop and a 763 MHz NVIDIA graphical processing unit, respectively. Compared to a commercial high-level synthesis tool including regularity extraction, the networks of heterogeneous processing elements were on average 10.8x faster. Compared to homogeneous networks of general and single-type custom processing elements, heterogeneous networks were on average 7x and 6x faster, respectively.