Two-level microprocessor-accelerator partitioning

Authors:
Scott Sirowy;Yonghui Wu;Stefano Lonardi;Frank Vahid
Affiliations:
University of California, Riverside;University of California, Riverside;University of California, Riverside;University of California, Riverside and University of California, Irvine
Venue:
Proceedings of the conference on Design, automation and test in Europe
Year:
2007

Citing 16
Cited 0

Combinatorial algorithms for integrated circuit layout

Combinatorial algorithms for integrated circuit layout
Hardware/software partitioning for multi-function systems

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A low power hardware/software partitioning approach for core-based embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Hardware-Software Cosynthesis for Digital Systems

IEEE Design & Test
Profiling tools for hardware/software partitioning of embedded applications

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
The NAPA Adaptive Processing Architecture

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
PCI-PipeRench and the SWORDAPI: A System for Stream-Based Reconfigurable Computing

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms

Proceedings of the conference on Design, Automation and Test in Europe - Volume 3
Application-specific customization of soft processor microarchitecture

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
GALDS: a complete framework for designing multiclock ASICs and socs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The integration of microprocessors and field-programmable gate array (FPGA) fabric on a single chip increases both the utility and necessity of tools that automatically move software functions from the microprocessor to accelerators on the FPGA to improve performance or energy. Such hardware/software partitioning for modern FPGAs involves the problem of partitioning functions among two levels of accelerator groups -- tightly-coupled accelerators that have fast single-clock-cycle memory access to the microprocessor's memory, and loosely-coupled accelerators that access memory through a bridge to avoid slowing the main clock period with their longer critical paths. We introduce this new two-level accelerator-partitioning problem, and we describe a novel optimal dynamic programming algorithm to solve the problem. By making use of the size constraint imposed by FPGAs, the algorithm has what is effectively quadratic runtime complexity, running in just a few seconds for examples with up to 25 accelerators, obtaining an average performance improvement of 35% compared to a traditional single-level bus architecture.