A compiler approach to fast hardware design space exploration in FPGA-based systems

  • Authors:
  • Byoungro So;Mary W. Hall;Pedro C. Diniz

  • Affiliations:
  • University of Southern California, Marina del Rey, California;University of Southern California, Marina del Rey, California;University of Southern California, Marina del Rey, California

  • Venue:
  • PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.