Cathedral-III: Architecture-driven high-level synthesis for high throughput DSP applications
DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
High-level synthesis: introduction to chip and system design
High-level synthesis: introduction to chip and system design
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Data-parallel C on a reconfigurable logic array
The Journal of Supercomputing - Special issue on field programmable gate arrays
Efficient formulation for optimal modulo schedulers
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improved interconnect sharing by identity operation insertion
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Performance-constrained pipelining of software loops onto reconfigurable hardware
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
The Garp Architecture and C Compiler
Computer
DEFACTO: A Design Environment for Adaptive Computing Technology
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Global resource sharing for synthesis of control data flow graphs on FPGAs
Proceedings of the 40th annual Design Automation Conference
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Facet: A procedure for the automated synthesis of digital systems
DAC '83 Proceedings of the 20th Design Automation Conference
Interconnect optimisation during data path allocation
EURO-DAC '90 Proceedings of the conference on European design automation
The design of dynamically reconfigurable datapath coprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
A design flow dedicated to multi-mode architectures for DSP applications
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Modulo scheduling for highly customized datapaths to increase hardware reusability
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
The input-aware dynamic adaptation of area and performance for reconfigurable accelerator
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Finding the best compromise in compiling compound loops to Verilog
Journal of Systems Architecture: the EUROMICRO Journal
Impact of high-level transformations within the ROCCC framework
ACM Transactions on Architecture and Code Optimization (TACO)
High-level synthesis for designing multimode architectures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A cyclic scheduling problem with an undetermined number of parallel identical processors
Computational Optimization and Applications
Hi-index | 0.00 |
To meet the conflicting goals of high-performance low-cost embedded systems, critical application loop nests are commonly executed on specialized hardware accelerators. These loop accelerators are traditionally designed in a single-function manner, wherein each loop nest is implemented as a dedicated hardware block. This paper focuses on hardware sharing across loop nests by creating multifunction loop accelerators, or accelerators capable of executing multiple algorithms. A compiler-based system for automatically synthesizing multifunction loop accelerator architectures from C code is presented. We compare the effectiveness of three architecture synthesis approaches with varying levels of complexity: sum of individual accelerators, union of individual accelerators, and joint accelerator synthesis. Experiments show that multifunction accelerators achieve substantial hardware savings over combinations of single-function designs. In addition, the union approach to multifunction synthesis is shown to be effective at creating low-cost hardware by exploiting hardware sharing, while remaining computationally tractable.