ACM Computing Surveys (CSUR)
Survey of closed queueing networks with blocking
ACM Computing Surveys (CSUR)
Automatic operator configuration in the synthesis of pipelined architectures
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Fast, minimum storage ray-triangle intersection
Journal of Graphics Tools
Probabilistic modelling
Retargetable compiler technology for embedded systems: tools and applications
Retargetable compiler technology for embedded systems: tools and applications
Computer Architecture: Pipelined and Parallel Processor Design
Computer Architecture: Pipelined and Parallel Processor Design
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Sea Cucumber: A Synthesizing Compiler for FPGAs
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Compiling Application-Specific Hardware
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Incremental reconfiguration for pipelined applications
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Reconfigurable Computing for Augmented Reality
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Stream-Oriented FPGA Computing in the Streams-C High Level Language
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Coarse-Grain Pipelining on Multiple FPGA Architectures
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Accelerating Radiosity Calculations Using Reconfigurable Platforms
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
The working set model for program behavior
SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
High-level Synthesis of Multi-process Behavioral Descriptions
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations
IEEE Transactions on Computers
Automatic Mapping of Multiple Applications to Multiple Adaptive Computing Systems
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Lava and JBits: From HDL to Bitstream in Seconds
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Theory, Volume 1, Queueing Systems
Theory, Volume 1, Queueing Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Energy reduction by systematic run-time reconfigurable hardware deactivation
Transactions on High-Performance Embedded Architectures and Compilers IV
Hi-index | 14.98 |
This paper explores using information about program branch probabilities to optimize the results of hardware compilation. The basic premise is to promote utilization by dedicating more resources to branches which execute more frequently. A new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities. We propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given a set of observed branch probabilities. An experimental hardware compilation system has been developed to evaluate this approach. The branch optimization design space is characterized in an experimental study for Xilinx Virtex FPGAs of two complex applications: video feature extraction and progressive refinement radiosity. For designs of equal performance, branch-optimized designs require 24 percent and 27.5 percent less area. For designs of equal area, branch optimized designs run up to three times faster. Our analytical performance model is shown to be highly accurate with relative error between 0.12 and 1.1\times10^{-4}.