Increasing hardware efficiency with multifunction loop accelerators

Authors:
Kevin Fan;Manjunath Kudlur;Hyunchul Park;Scott Mahlke
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Year:
2006

Citing 20
Cited 9

Cathedral-III: Architecture-driven high-level synthesis for high throughput DSP applications

DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Data-parallel C on a reconfigurable logic array

The Journal of Supercomputing - Special issue on field programmable gate arrays
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improved interconnect sharing by identity operation insertion

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Performance-constrained pipelining of software loops onto reconfigurable hardware

FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Journal of VLSI Signal Processing Systems
The Garp Architecture and C Compiler

Computer
DEFACTO: A Design Environment for Adaptive Computing Technology

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Global resource sharing for synthesis of control data flow graphs on FPGAs

Proceedings of the 40th annual Design Automation Conference
Mapping applications to the RaPiD configurable architecture

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Facet: A procedure for the automated synthesis of digital systems

DAC '83 Proceedings of the 20th Design Automation Conference
Interconnect optimisation during data path allocation

EURO-DAC '90 Proceedings of the conference on European design automation
The design of dynamically reconfigurable datapath coprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture

Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
A design flow dedicated to multi-mode architectures for DSP applications

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
The input-aware dynamic adaptation of area and performance for reconfigurable accelerator

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Efficient resource utilization for an extensible processor through dynamic instruction set adaptation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Finding the best compromise in compiling compound loops to Verilog

Journal of Systems Architecture: the EUROMICRO Journal
Impact of high-level transformations within the ROCCC framework

ACM Transactions on Architecture and Code Optimization (TACO)
High-level synthesis for designing multimode architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A cyclic scheduling problem with an undetermined number of parallel identical processors

Computational Optimization and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

To meet the conflicting goals of high-performance low-cost embedded systems, critical application loop nests are commonly executed on specialized hardware accelerators. These loop accelerators are traditionally designed in a single-function manner, wherein each loop nest is implemented as a dedicated hardware block. This paper focuses on hardware sharing across loop nests by creating multifunction loop accelerators, or accelerators capable of executing multiple algorithms. A compiler-based system for automatically synthesizing multifunction loop accelerator architectures from C code is presented. We compare the effectiveness of three architecture synthesis approaches with varying levels of complexity: sum of individual accelerators, union of individual accelerators, and joint accelerator synthesis. Experiments show that multifunction accelerators achieve substantial hardware savings over combinations of single-function designs. In addition, the union approach to multifunction synthesis is shown to be effective at creating low-cost hardware by exploiting hardware sharing, while remaining computationally tractable.