Compiler-generated communication for pipelined FPGA applications

Authors:
Heidi E. Ziegler;Mary W. Hall;Pedro C. Diniz
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA
Venue:
Proceedings of the 40th annual Design Automation Conference
Year:
2003

Citing 13
Cited 6

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
A technique for summarizing data access and its use in parallelism enhancing transformations

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Preliminary experiences with the Fortran D compiler

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallelizing compiler techniques based on linear inequalities

Parallelizing compiler techniques based on linear inequalities
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Adapting software pipelining for reconfigurable computing

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Pipeline Vectorization for Reconfigurable Systems

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Coarse-Grain Pipelining on Multiple FPGA Architectures

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Synthesis of multi-dimensional high-speed FIFOs for out-of-order communication

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Array replication to increase parallelism in applications mapped to configurable architectures

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
An FPGA-based multi-core approach for pipelining computing stages

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a set of compiler analyses and an implementation that automatically map a sequential and un-annotated C program into a pipelined implementation, targeted for an FPGA with multiple external memories. For this purpose, we extend array data-flow analysis techniques from parallelizing compilers to identify pipeline stages, required inter-pipeline stage communication, and opportunities to find a minimal program execution time by trading communication overhead with the amount of computation overlap in different stages. Using the results of this analysis, we automatically generate application-specific pipelined FPGA hardware designs. We use a sample image processing kernel to illustrate these concepts. Our algorithm finds a solution in which transmitting a row of an array between pipeline stages per communication instance leads to a speedup of 1.76 over an implementation that communicates the entire array at once.