Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines

Authors:
Manjunath Kudlur;Kevin Fan;Scott Mahlke
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Year:
2006

Citing 14
Cited 3

Cathedral-III: Architecture-driven high-level synthesis for high throughput DSP applications

DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
The ALPHA language and its use for the design of systolic arrays

Journal of VLSI Signal Processing Systems - Special issue: algorithms and parallel VSLI architecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
The system architect's workbench

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
HERCULES—a system for high-level synthesis

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Design process model in the Yorktown Silicon Compiler

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Allocation and scheduling of conditional task graph in hardware/software co-synthesis

Proceedings of the conference on Design, automation and test in Europe
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Journal of VLSI Signal Processing Systems
The MIMOLA design system: Detailed description of the software system

DAC '79 Proceedings of the 16th Design Automation Conference
Synthesis of Application Specific Multiprocessor Architectures for Process Networks

VLSID '04 Proceedings of the 17th International Conference on VLSI Design
System Design Using Kahn Process Networks: The Compaan/Laura Approach

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Increasing hardware efficiency with multifunction loop accelerators

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis

MPSoC Design Using Application-Specific Architecturally Visible Communication

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The benefits of using variable-length pipelined operations in high-level synthesis

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a methodology for designing a pipeline of accelerators for an application. The application is modeled using sequential C language with simple stylizations. The synthesis of the accelerator pipeline involves designing loop accelerators for individual kernels, instantiating buffers for arrays used in the application, and hooking up these building blocks to form a pipeline. A compiler-based system automatically synthesizes loop accelerators for individual kernels at varying performance levels. An integer linear program formulation which simultaneously optimizes the cost of loop accelerators and the cost of memory buffers is proposed to compose the loop accelerators to form an accelerator pipeline for the whole application. Cases studies for some applications, including FMRadio and Beamformer, are presented to illustrate our design methodology. Experiments show significant cost savings are achieved through hardware sharing, while achieving the prescribed throughput requirements.