Synthesis of networks of custom processing elements for real-time physical system emulation

Authors:
Chen Huang;Bailey Miller;Frank Vahid;Tony Givargis
Affiliations:
University of California, Riverside;University of California, Riverside;University of California, Riverside;University of California, Irvine
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2013

Citing 6
Cited 0

Stream-Oriented FPGA Computing in the Streams-C High Level Language

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
ReCSiP: a reconfigurable cell simulation platform: accelerating biological applications with FPGA

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Sharing of SRAM tables among NPN-equivalent LUTs in SRAM-based FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Designing Modular Hardware Accelerators in C with ROCCC 2.0

FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
A Custom FPGA Processor for Physical Model Ordinary Differential Equation Solving

IEEE Embedded Systems Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emulating a physical system in real-time or faster has numerous applications in cyber-physical system design and deployment. For example, testing of a cyber-device's software (e.g., a medical ventilator) can be done via interaction with a real-time digital emulation of the target physical system (e.g., a human's respiratory system). Physical system emulation typically involves iteratively solving thousands of ordinary differential equations (ODEs) that model the physical system. We describe an approach that creates custom processing elements (PEs) specialized to the ODEs of a particular model while maintaining some programmability, targeting implementation on field-programmable gate arrays (FPGAs). We detail the PE micro-architecture and accompanying automated compilation and synthesis techniques. Furthermore, we describe our efforts to use a high-level synthesis approach that incorporates regularity extraction techniques as an alternative FPGA-based solution, and also describe an approach using graphics processing units (GPUs). We perform experiments with five models: a Weibel lung model, a Lutchen lung model, an atrial heart model, a neuron model, and a wave model; each model consists of several thousand ODEs and targets a Xilinx Virtex 6 FPGA. Results of the experiments show that the custom PE approach achieves 4X-9X speedups (average 6.7X) versus our previous general ODE-solver PE approach, and 7X-10X speedups (average 8.7X) versus high-level synthesis, while using approximately the same or fewer FPGA resources. Furthermore, the approach achieves speedups of 18X-32X (average 26X) versus an Nvidia GTX 460 GPU, and average speedups of more than 100X compared to a six-core TI DSP processor or a four-core ARM processor, and 24X versus an Intel I7 quad core processor running at 3.06 GHz. While an FPGA implementation costs about 3X-5X more than the non-FPGA approaches, a speedup/dollar analysis shows 10X improvement versus the next best approach, with the trend of decreasing FPGA costs improving speedup/dollar in the future.