Domain-Specific language and compiler for stencil computation on FPGA-Based systolic computational-memory array

Authors:
Wang Luzhou;Kentaro Sano;Satoru Yamamoto
Affiliations:
Graduate School of Information Sciences, Tohoku University, Sendai, Japan;Graduate School of Information Sciences, Tohoku University, Sendai, Japan;Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Venue:
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Year:
2012

Citing 11
Cited 0

Partitioning Processor Arrays under Resource Constraints

Journal of VLSI Signal Processing Systems
Array language support for parallel sparse computation

ICS '01 Proceedings of the 15th international conference on Supercomputing
Computational RAM: Implementing Processors in Memory

IEEE Design & Test
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Systolic Architecture for Computational Fluid Dynamics on FPGAs

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Why Systolic Architectures?

Computer
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The pochoir stencil compiler

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a domain-specific language for stencil computation (DSLSC) and its compiler for our FPGA-based systolic computational-memory array (SCMA). In DSLSC, we can program stencil computations by describing their mathematical form instead of writing explicit procedure optimally. The compiler automatically parallelizes stencil computations for processing elements (PEs) of SCMA, and schedules multiply-and-add operations for PEs considering data-reference delay via a local memory or communication FIFOs between PEs. For arbitrary grid-sizes of 2D Jacobi compilation with 3x3 and 5x5 stencils, the compiler achieves high utilization of PEs, 85.6 % and 92.18 %, which are close to 87.5 % and 93.75 % for ideal cases, respectively.