Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs

Authors:
Chen Huang;Frank Vahid;Tony Givargis
Affiliations:
University of California, Riverside, CA;University of California, Riverside, CA;University of California, Irvine, CA
Venue:
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Year:
2013

Citing 10
Cited 0

ATUM: a new technique for capturing address traces using microcode

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Stream-Oriented FPGA Computing in the Streams-C High Level Language

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
High-Level Language Abstraction for Reconfigurable Computing

Computer
ReCSiP: a reconfigurable cell simulation platform: accelerating biological applications with FPGA

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Balanced Graph Partitioning

Theory of Computing Systems
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Bridging the gap between compilation and synthesis in the DEFACTO system

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A Custom FPGA Processor for Physical Model Ordinary Differential Equation Solving

IEEE Embedded Systems Letters
Real-time deformation of detailed geometry based on mappings to a less detailed physical simulation on the GPU

EGVE'05 Proceedings of the 11th Eurographics conference on Virtual Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast execution of physical system models has various uses, such as simulating physical phenomena or real-time testing of medical equipment. Physical system models commonly consist of thousands of differential equations. Solving such equations using software on microprocessor devices may be slow. Several past efforts implement such models as parallel circuits on special computing devices called Field-Programmable Gate Arrays (FPGAs), demonstrating large speedups due to the excellent match between the massive fine-grained local communication parallelism common in physical models and the fine-grained parallel compute elements and local connectivity of FPGAs. However, past implementation efforts were mostly manual or ad hoc. We present the first method for automatically converting a set of ordinary differential equations into circuits on FPGAs. The method uses a general Processing Element (PE) that we developed, designed to quickly solve a set of ordinary differential equations while using few FPGA resources. The method instantiates a network of general PEs, partitions equations among the PEs to minimize communication, generates each PE's custom program, creates custom connections among PEs, and maintains synchronization of all PEs in the network. Our experiments show that the method generates a 400-PE network on a commercial FPGA that executes four different models on average 15x faster than a 3 GHz Intel processor, 30x faster than a commercial 4-core ARM, 14x faster than a commercial 6-core Texas Instruments digital signal processor, and 4.4x faster than an NVIDIA 336-core graphics processing unit. We also show that the FPGA-based approach is reasonably cost effective compared to using the other platforms. The method yields 2.1x faster circuits than a commercial high-level synthesis tool that uses the traditional method for converting behavior to circuits, while using 2x fewer lookup tables, 2x fewer hardcore multiplier (DSP) units, though 3.5x more block RAM due to being programmable. Furthermore, the method does not just generate a single fastest design, but generates a range of designs that trade off size and performance, by using different numbers of PEs.