Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Digital signal processing (3rd ed.): principles, algorithms, and applications
Digital signal processing (3rd ed.): principles, algorithms, and applications
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hardware-software co-design of embedded reconfigurable architectures
Proceedings of the 37th Annual Design Automation Conference
Evaluation of the streams-C C-to-FPGA compiler: an applications perspective
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
An automated process for compiling dataflow graphs into reconfigurable hardware
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Understanding Behavioral Synthesis: A Practical Guide to High-Level Design
Understanding Behavioral Synthesis: A Practical Guide to High-Level Design
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Specifying and Compiling Applications for RaPiD
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Parallelizing Applications into Silicon
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A Bit-Serial Implementation of the International Data Encryption Algorithm IDEA
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Fast Area Estimation to Support Compiler Optimizations in FPGA-Based Reconfigurable Systems
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Coarse-Grain Pipelining on Multiple FPGA Architectures
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Using estimates from behavioral synthesis tools in compiler-directed design space exploration
Proceedings of the 40th annual Design Automation Conference
Global resource sharing for synthesis of control data flow graphs on FPGAs
Proceedings of the 40th annual Design Automation Conference
Compiler-generated communication for pipelined FPGA applications
Proceedings of the 40th annual Design Automation Conference
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ARCHITECT-R: a system for reconfigurable robots design
Proceedings of the 2003 ACM symposium on Applied computing
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Applications of storage mapping optimization to register promotion
Proceedings of the 18th annual international conference on Supercomputing
Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations
IEEE Transactions on Computers
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A scheduling algorithm for optimization and early planning in high-level synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
An Infrastructure to Functionally Test Designs Generated by Compilers Targeting FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
High-level synthesis for large bit-width multipliers on FPGAs: a case study
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Compiler optimization of embedded applications for an adaptive SoC architecture
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs
IEEE Transactions on Dependable and Secure Computing
Automatic mapping of nested loops to FPGAS
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware/software partitioning with multi-version implementation exploration
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Achieving programming model abstractions for reconfigurable computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Synthesis of reconfigurable high-performance multicore systems
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
A compiler intermediate representation for reconfigurable fabrics
International Journal of Parallel Programming
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for core-level modeling and design of reconfigurable computing algorithms
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
A design space exploration algorithm in compiling window operation onto reconfigurable hardware
International Journal of Computers and Applications
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Design space exploration acceleration through operation clustering
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A framework for compiler driven design space exploration for embedded system customization
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Divide and conquer high-level synthesis design space exploration
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
"Smart" design space sampling to predict Pareto-optimal solutions
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Loop acceleration exploration for ASIP architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast and standalone Design Space Exploration for High-Level Synthesis under resource constraints
Journal of Systems Architecture: the EUROMICRO Journal
Fast Design Exploration for Performance, Power and Accuracy Tradeoffs in FPGA-Based Accelerators
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.