A compiler approach to fast hardware design space exploration in FPGA-based systems

Authors:
Byoungro So;Mary W. Hall;Pedro C. Diniz
Affiliations:
University of Southern California, Marina del Rey, California;University of Southern California, Marina del Rey, California;University of Southern California, Marina del Rey, California
Venue:
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Year:
2002

Citing 15
Cited 36

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Digital signal processing (3rd ed.): principles, algorithms, and applications

Digital signal processing (3rd ed.): principles, algorithms, and applications
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hardware-software co-design of embedded reconfigurable architectures

Proceedings of the 37th Annual Design Automation Conference
Evaluation of the streams-C C-to-FPGA compiler: an applications perspective

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
An automated process for compiling dataflow graphs into reconfigurable hardware

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Understanding Behavioral Synthesis: A Practical Guide to High-Level Design

Understanding Behavioral Synthesis: A Practical Guide to High-Level Design
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Specifying and Compiling Applications for RaPiD

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Parallelizing Applications into Silicon

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A Bit-Serial Implementation of the International Data Encryption Algorithm IDEA

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Fast Area Estimation to Support Compiler Optimizations in FPGA-Based Reconfigurable Systems

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Coarse-Grain Pipelining on Multiple FPGA Architectures

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

Using estimates from behavioral synthesis tools in compiler-directed design space exploration

Proceedings of the 40th annual Design Automation Conference
Global resource sharing for synthesis of control data flow graphs on FPGAs

Proceedings of the 40th annual Design Automation Conference
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ARCHITECT-R: a system for reconfigurable robots design

Proceedings of the 2003 ACM symposium on Applied computing
Custom Data Layout for Memory Parallelism

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Input data reuse in compiling window operations onto reconfigurable hardware

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations

IEEE Transactions on Computers
The Energy Impact of Aggressive Loop Fusion

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A scheduling algorithm for optimization and early planning in high-level synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
An Infrastructure to Functionally Test Designs Generated by Compilers Targeting FPGAs

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
High-level synthesis for large bit-width multipliers on FPGAs: a case study

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Compiler optimization of embedded applications for an adaptive SoC architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs

IEEE Transactions on Dependable and Secure Computing
Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware/software partitioning with multi-version implementation exploration

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Achieving programming model abstractions for reconfigurable computing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Synthesis of reconfigurable high-performance multicore systems

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
A compiler intermediate representation for reconfigurable fabrics

International Journal of Parallel Programming
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for core-level modeling and design of reconfigurable computing algorithms

Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
A design space exploration algorithm in compiling window operation onto reconfigurable hardware

International Journal of Computers and Applications
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Design space exploration acceleration through operation clustering

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A framework for compiler driven design space exploration for embedded system customization

ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Divide and conquer high-level synthesis design space exploration

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
"Smart" design space sampling to predict Pareto-optimal solutions

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Loop acceleration exploration for ASIP architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast and standalone Design Space Exploration for High-Level Synthesis under resource constraints

Journal of Systems Architecture: the EUROMICRO Journal
Fast Design Exploration for Performance, Power and Accuracy Tradeoffs in FPGA-Based Accelerators

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.