Exploiting Program Branch Probabilities in Hardware Compilation

Authors:
Henry Styles;Wayne Luk
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 24
Cited 1

Dataflow machine architecture

ACM Computing Surveys (CSUR)
Survey of closed queueing networks with blocking

ACM Computing Surveys (CSUR)
Automatic operator configuration in the synthesis of pipelined architectures

DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Fast, minimum storage ray-triangle intersection

Journal of Graphics Tools
Probabilistic modelling

Probabilistic modelling
Retargetable compiler technology for embedded systems: tools and applications

Retargetable compiler technology for embedded systems: tools and applications
Computer Architecture: Pipelined and Parallel Processor Design

Computer Architecture: Pipelined and Parallel Processor Design
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Managing multi-configuration hardware via dynamic working set analysis

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Sea Cucumber: A Synthesizing Compiler for FPGAs

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Compiling Application-Specific Hardware

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Incremental reconfiguration for pipelined applications

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Reconfigurable Computing for Augmented Reality

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Stream-Oriented FPGA Computing in the Streams-C High Level Language

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Coarse-Grain Pipelining on Multiple FPGA Architectures

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Accelerating Radiosity Calculations Using Reconfigurable Platforms

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
The working set model for program behavior

SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
High-level Synthesis of Multi-process Behavioral Descriptions

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations

IEEE Transactions on Computers
Automatic Mapping of Multiple Applications to Multiple Adaptive Computing Systems

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Lava and JBits: From HDL to Bitstream in Seconds

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Theory, Volume 1, Queueing Systems

Theory, Volume 1, Queueing Systems
Pipeline vectorization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Energy reduction by systematic run-time reconfigurable hardware deactivation

Transactions on High-Performance Embedded Architectures and Compilers IV

Quantified Score

Hi-index	14.98

Visualization

Abstract

This paper explores using information about program branch probabilities to optimize the results of hardware compilation. The basic premise is to promote utilization by dedicating more resources to branches which execute more frequently. A new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities. We propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given a set of observed branch probabilities. An experimental hardware compilation system has been developed to evaluate this approach. The branch optimization design space is characterized in an experimental study for Xilinx Virtex FPGAs of two complex applications: video feature extraction and progressive refinement radiosity. For designs of equal performance, branch-optimized designs require 24 percent and 27.5 percent less area. For designs of equal area, branch optimized designs run up to three times faster. Our analytical performance model is shown to be highly accurate with relative error between 0.12 and 1.1\times10^{-4}.