Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
A novel framework of register allocation for software pipelining
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
ILP-based cost-optimal DSP synthesis with module selection and data format conversion
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
Embedded Multiprocessors: Scheduling and Synchronization
Embedded Multiprocessors: Scheduling and Synchronization
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks
Journal of VLSI Signal Processing Systems
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Minimising buffer requirements of synchronous dataflow graphs with model checking
Proceedings of the 42nd annual Design Automation Conference
Optimal module and voltage assignment for low-power
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
An efficient and versatile scheduling algorithm based on SDC formulation
Proceedings of the 43rd annual Design Automation Conference
Proceedings of the 43rd annual Design Automation Conference
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A computing origami: folding streams in FPGAs
Proceedings of the 46th Annual Design Automation Conference
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exploiting just-enough parallelism when mapping streaming applications in hard real-time systems
Proceedings of the 50th Annual Design Automation Conference
Combining module selection and replication for throughput-driven streaming programs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Data streaming is a widely-used technique to exploit task-level parallelism in many application domains such as video processing, signal processing and wireless communication. In this paper we propose an efficient system-level synthesis flow to map streaming applications onto FPGAs with consideration of simultaneous computation and communication optimizations. The throughput of a streaming system is significantly impacted by not only the performance and number of replicas of the computation kernels, but also the buffer size allocated for the communications between kernels. In general, module selection/replication and buffer size optimization were addressed separately in previous work. Our approach combines these optimizations together in system scheduling which minimizes the area cost for both logic and memory under the required throughput constraint. We first propose an integer linear program (ILP) based solution to the combined problem which has the optimal quality of results. Then we propose an iterative algorithm which can achieve the near-optimal quality of results but has a significant improvement on the algorithm scalability for large and complex designs. The key contribution is that we have a polynomial-time algorithm for an exact schedulability checking problem and a polynomial-time algorithm to improve the system performance with better module implementation and buffer size optimization. Experimental results show that compared to the separate scheme of module select/replication and buffer size optimization, the combined optimization scheme can gain 62% area saving on average under the same performance requirements. Moreover, our heuristic can achieve 2 to 3 orders of magnitude of speed-up in runtime, with less than 10% area overhead compared to the optimal solution by ILP.