The benefits of using variable-length pipelined operations in high-level synthesis

Authors:
Yosi Ben-Asher;Nadav Rotem
Affiliations:
University of Haifa;University of Haifa
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2013

Citing 31
Cited 0

Force-directed scheduling in automatic data path synthesis

DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Sehwa: A program for synthesis of pipelines

25 years of DAC Papers on Twenty-five years of electronic design automation
Local optimization on graphs

Discrete Applied Mathematics
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
Logic synthesis

Logic synthesis
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
ILP-based cost-optimal DSP synthesis with module selection and data format conversion

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Cycle-time aware architecture synthesis of custom hardware accelerators

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Introduction to the Scheduling Problem

IEEE Design & Test
Hierarchical Scheduling in High Level Synthesis Using Resource Sharing Across Nested Loops

GLS '99 Proceedings of the Ninth Great Lakes Symposium on VLSI
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Algorithm Design

Algorithm Design
Towards a Source Level Compiler: Source Level Modulo Scheduling

ICPPW '06 Proceedings of the 2006 International Conference Workshops on Parallel Processing
Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
The impact of loop unrolling on controller delay in high level synthesis

Proceedings of the conference on Design, automation and test in Europe
Compiling code accelerators for FPGAs

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A design flow dedicated to multi-mode architectures for DSP applications

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Pattern-based behavior synthesis for FPGA resource reduction

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
PARO: Synthesis of Hardware Accelerators for Multi-dimensional Dataflow-Intensive Applications

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Finding the Best Compromise in Compiling Compound Loops to Verilog

ISVLSI '08 Proceedings of the 2008 IEEE Computer Society Annual Symposium on VLSI
Impact of Loop Unrolling on Area, Throughput and Clock Frequency for Window Operations Based on a Data Schedule Method

CISP '08 Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 1 - Volume 01
Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Multiple clock and voltage domains for chip multi processors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Automatic memory partitioning: increasing memory parallelism via data structure partitioning

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Efficient retiming of large circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pipeline vectorization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Path-based scheduling for synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A formal approach to the scheduling problem in high level synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current high-level synthesis systems synthesize arithmetic units of a fixed known number of stages, and the scheduler mainly determines when units are activated. We focus on scheduling techniques for the high-level synthesis of pipelined arithmetic units where the number of stages of these operations is a free parameter of the synthesis. This problem is motivated by the ability to automatically create pipelined functional units, such as multipliers, with different pipe lengths. These units have different characteristics in terms of parallelism level, clock latency, frequency, etc. This article presents the Variable-length Pipeline Scheduler (VPS). The ability to synthesize variable-length pipelined units expands the known scheduling problem of high-level synthesis to include a search for a minimal number of hardware units (operations) and their desired number of stages. The proposed search procedure is based on algorithms that find a local minima in a d-dimensional grid, thus avoiding the need to evaluate all possible points in the space. We have implemented a C language compiler for VPS targeting FPGAs. Our results demonstrate that using variable-length pipeline units can reduce the overall resource usage and improve the execution time when synthesized onto an FPGA. The proposed search is sufficiently fast, taking only a few seconds, allowing an interactive mode of work. A comparison with xPilot shows a significant saving of hardware resources while maintaining comparable execution times of the resulting circuits. This work is an extension of a previous paper [Ben-Asher and Rotem 2008]