Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Authors:
Heidi Ziegler;Mary Hall
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA
Venue:
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Year:
2005

Citing 24
Cited 8

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
A technique for summarizing data access and its use in parallelism enhancing transformations

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Preliminary experiences with the Fortran D compiler

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Scalar replacement in the presence of conditional control flow

Software—Practice & Experience
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallelizing compiler techniques based on linear inequalities

Parallelizing compiler techniques based on linear inequalities
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Adapting software pipelining for reconfigurable computing

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Combined instruction and loop parallelism in array synthesis for FPGAs

Proceedings of the 14th international symposium on Systems synthesis
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Using estimates from behavioral synthesis tools in compiler-directed design space exploration

Proceedings of the 40th annual Design Automation Conference
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
Specifying and Compiling Applications for RaPiD

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Pipeline Vectorization for Reconfigurable Systems

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Parallelizing Applications into Silicon

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Coarse-Grain Pipelining on Multiple FPGA Architectures

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Custom Data Layout for Memory Parallelism

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An efficient design space exploration for balance between computation and memory

An efficient design space exploration for balance between computation and memory
A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine-Grain Configurable Architectures

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Bridging the gap between compilation and synthesis in the DEFACTO system

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Mapping streaming architectures on reconfigurable platforms

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
A computing origami: folding streams in FPGAs

Proceedings of the 46th Annual Design Automation Conference
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Model-based synthesis and optimization of static multi-rate image processing algorithms

Proceedings of the Conference on Design, Automation and Test in Europe
FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A taxonomy of accelerator architectures and their programming models

IBM Journal of Research and Development
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a set of measurements which characterize the design space for automatically mapping high-level algorithms consisting of multiple loop nests, expressed in C, onto an FPGA. We extend a prior compiler algorithm that derived optimized FPGA implementations for individual loop nests. We focus on the space-time tradeoffs associated with sharing constrained chip area among multiple computations represented by an asynchronous pipeline. Intermediate results are communicated on chip; communication analysis generates this communication automatically. Other analyses and transformations, also associated with parallelizing compiler technology, are used to perform high-level optimization of the designs. We vary the amount of parallelism in individual loop nests with the goal of deriving an overall design that makes the most effective use of chip resources. We describe several heuristics for automatically searching the space and a set of metrics for evaluating and comparing designs. From results obtained through an automated process, we demonstrate that heuristics derived through sophisticated compiler analysis are the most effective at navigating this complex search space, particularly for more complex applications.