FPGA acceleration of a quantum Monte Carlo application
Parallel Computing
Simulation of two-dimensional supersonic flows on emulated-digital CNN-UM
EURASIP Journal on Advances in Signal Processing - CNN technology for spatiotemporal signal processing
Computational bit-width allocation for operations in vector calculus
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Bit-width allocation for hardware accelerators for scientific computing using SAT-modulo theory
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Finite precision bit-width allocation using SAT-modulo theory
Proceedings of the Conference on Design, Automation and Test in Europe
Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system
ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Cost effective implementation of flux limiter functions using partial reconfiguration
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Hi-index | 0.00 |
This paper presents an FPGA-based flow solver based on the systolic architecture. We show that the fractional-step method employing central difference schemes can be expressed as a systolic algorithm, and therefore the systolic architecture is suitable for a dedicated processor to the flow solver. We have designed a 2D systolic array of cells, each of which has a micro-programmable data-path containing a MAC (multiplication and accumulation) unit and a local memory to store necessary data for computational fluid dynamics. With ALTERA Stratix II FPGA, we implemented 96(= 12 脳 8) cells running at 60MHz. Since the MAC unit has both an adder and a multiplier for single-precision floating-point numbers, the total peak performance is 11.5(= 96脳60MHz脳2) GFlops. We made a choice of 2D square driven cavity flow as a benchmark computation based on the fractional-step method. For this computation, the FPGA-based processor running only at 60MHz achieved 7.14 and 6.41 times faster computations than Pentium4 processor at 3.2 GHz and Itanium2 at 1.4 GHz, respectively.