Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs

Authors:
Yosi Ben-Asher;Danny Meisler;Nadav Rotem
Affiliations:
University of Haifa;University of Haifa;University of Haifa
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2010

Citing 18
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Utilization of multiport memories in data path synthesis

DAC '93 Proceedings of the 30th international Design Automation Conference
Logic synthesis

Logic synthesis
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Decomposed software pipelining: a new perspective and a new approach

International Journal of Parallel Programming
A scheduling algorithm for multiport memory minimization in datapath synthesis

ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
A new guaranteed heuristic for the software pipelining problem

ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Adapting software pipelining for reconfigurable computing

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cycle-time aware architecture synthesis of custom hardware accelerators

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Introduction to the Scheduling Problem

IEEE Design & Test
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
From C Programs to the Configure-Execute Model

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Towards a Source Level Compiler: Source Level Modulo Scheduling

ICPPW '06 Proceedings of the 2006 International Conference Workshops on Parallel Processing
Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Finding the Best Compromise in Compiling Compound Loops to Verilog

ISVLSI '08 Proceedings of the 2008 IEEE Computer Society Annual Symposium on VLSI

SDC-based modulo scheduling for pipeline synthesis

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

In High-Level Synthesis (HLS), extracting parallelism in order to create small and fast circuits is the main advantage of HLS over software execution. Modulo Scheduling (MS) is a technique in which a loop is parallelized by overlapping different parts of successive iterations. This ability to extract parallelism makes MS an attractive synthesis technique for loop acceleration. In this work we consider two problems involved in the use of MS which are central when targeting FPGAs. Current MS scheduling techniques sacrifice execution times in order to meet resource and delay constraints. Let “ideal” execution times be the ones that could have been obtained by MS had we ignored resource and delay constraints. Here we pose the opposite problem, which is more suitable for HLS, namely, how to reduce resource constraints without sacrificing the ideal execution time. We focus on reducing the number of memory ports used by the MS synthesis, which we believe is a crucial resource for HLS. In addition to reducing the number of memory ports we consider the need to develop MS techniques that are fast enough to allow interactive synthesis times and repeated applications of the MS to explore different possibilities of synthesizing the circuits. Current solutions for MS synthesis that can handle memory constraints are too slow to support interactive synthesis. We formalize the problem of reducing the number of parallel memory references in every row of the kernel by a novel combinatorial setting. The proposed technique is based on inserting dummy operations in the kernel and by doing so, performing modulo-shift operations such that the maximal number of parallel memory references in a row is reduced. Experimental results suggest improved execution times for the synthesized circuit. The synthesis takes only a few seconds even for large-size loops.