Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation

Authors:
Kentaro Sano;Luzhou Wang;Satoru Yamamoto
Affiliations:
Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan
Venue:
ACM SIGARCH Computer Architecture News
Year:
2011

Citing 14
Cited 0

Programmable active memories: reconfigurable systems come of age

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
General-Purpose Systolic Arrays

Computer
Computational RAM: Implementing Processors in Memory

IEEE Design & Test
A Case for Intelligent RAM

IEEE Micro
Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architectures

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Towards an RCC-based accelerator for computational fluid dynamics applications

The Journal of Supercomputing
Maxwell - a 64 FPGA Supercomputer

AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
Systolic Architecture for Computational Fluid Dynamics on FPGAs

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Why Systolic Architectures?

Computer
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper demonstrates and evaluates the performance and the scalability of the systolic computational-memory array (SCMA) for stencil computation, which is a typical computing kernel of scientific simulation. We describe the basic architecture of th SCMA, and show the requirements and the design of SCMAs to scalably operate over multiple devices. We implement a prototype of the SCMA with three ALTERA Stratix III FPGAs, which form a 1--3 FPGA array by conecting three DE3 boards with different clock sources. The prototype SCMA demonstrates that the difference in operating clock frequency hardly influences the total execution cycles while it slightly causes stall cycles to sub-SCMAs on different FPGAs. With three banchmark programs of typical computing kernels based on the finite difference method, we show that the increased FPGAs provide higher performance proportional to the number of devices, resulting in almost linear speedup.