How to efficiently implement dynamic circuit specialization systems

Authors:
Fatma Abouelella;Tom Davidson;Wim Meeus;Karel Bruneel;Dirk Stroobandt
Affiliations:
Ghent University, Ghent, Belgium;Ghent University, Ghent, Belgium;Ghent University, Ghent, Belgium;Ghent University, Ghent, Belgium;Ghent University, Ghent, Belgium
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2013

Citing 16
Cited 0

Four pages are necessary and sufficient for planar graphs

STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Embedding graphs in books: a layout problem with applications to VLSI design

SIAM Journal on Algebraic and Discrete Methods
Stack computers: the new wave

Stack computers: the new wave
Improving functional density through run-time constant propagation

FPGA '97 Proceedings of the 1997 ACM fifth international symposium on Field-programmable gate arrays
Internet routing instability

IEEE/ACM Transactions on Networking (TON)
Stack and Queue Layouts of Directed Acyclic Graphs: Part I

SIAM Journal on Computing
Stack and Queue Layouts of Directed Acyclic Graphs: Part II

SIAM Journal on Computing
Efficient computation of expressions with common subexpressions

POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Efficient evaluation of Boolean expressions

ACM SIGPLAN Notices
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching

Proceedings of the 33rd annual international symposium on Computer Architecture
Dynamically reconfigurable regular expression matching architecture

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications

Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications
Automatic tool flow for shift-register-LUT reconfiguration: making run-time reconfiguration fast and easy (abstract only)

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
An interface for a decentralized 2d reconfiguration on xilinx virtex-FPGAs for organic computing

International Journal of Reconfigurable Computing - Selected papers from ReCoSoc08
Efficiently Generating FPGA Configurations through a Stack Machine

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Dynamic data folding with parameterizable FPGA configurations

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic circuit specialization (DCS) is a technique used to implement FPGA applications where some of the input data, called parameters, change slowly compared to other inputs. Each time the parameter values change, the FPGA is reconfigured by a configuration that is specialized for those new parameter values. This specialized configuration is much smaller and faster than a regular configuration. However, the overhead associated with the specialization process should be minimized to achieve the desired benefits of using the DCS technique. This overhead is represented by both the FPGA resources needed to specialize the FPGA at runtime and by the specialization time. The introduction of parameterized configurations [Bruneel and Stroobandt 2008] has improved the efficiency of DCS implementations. However, the specialization overhead still takes a considerable amount of resources and time. In this article, we explore how to efficiently build DCS systems by presenting a variety of possible solutions for the specialization process and the overhead associated with each of them. We split the specialization process into two main phases: the evaluation and the configuration phase. The PowerPC embedded processor, the MicroBlaze, and a customized processor (CP) are used as alternatives in the evaluation phase. In the configuration phase, the ICAP and a custom configuration interface (SRL configuration) are used as alternatives. Each solution is used to implement a DCS system for three applications: an adaptive finite impulse response (FIR) filter, a ternary content-addressable memory (TCAM), and a regular expression matcher (RegEx). The experiments show that the use of our CP along with the SRL configuration achieves minimum overhead in terms of resources and time. Our CP is 1.8 and 3.5 times smaller than the PowerPC and the area-optimized implementation of the MicroBlaze, respectively. Moreover, the use of the CP enables a more compact representation for the parameterized configuration in comparison to both the PowerPC and the MicroBlaze processors. For instance, in the FIR, the parameterized configuration compiled for our CP is 6--7 times smaller than that for the embedded processors.