Combining computation and communication optimizations in system synthesis for streaming applications

  • Authors:
  • Jason Cong;Muhuan Huang;Peng Zhang

  • Affiliations:
  • University of California, Los Angeles, Los Angeles, CA, USA;University of California, Los Angeles, Los Angeles, CA, USA;University of California, Los Angeles, Los Angeles, CA, USA

  • Venue:
  • Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data streaming is a widely-used technique to exploit task-level parallelism in many application domains such as video processing, signal processing and wireless communication. In this paper we propose an efficient system-level synthesis flow to map streaming applications onto FPGAs with consideration of simultaneous computation and communication optimizations. The throughput of a streaming system is significantly impacted by not only the performance and number of replicas of the computation kernels, but also the buffer size allocated for the communications between kernels. In general, module selection/replication and buffer size optimization were addressed separately in previous work. Our approach combines these optimizations together in system scheduling which minimizes the area cost for both logic and memory under the required throughput constraint. We first propose an integer linear program (ILP) based solution to the combined problem which has the optimal quality of results. Then we propose an iterative algorithm which can achieve the near-optimal quality of results but has a significant improvement on the algorithm scalability for large and complex designs. The key contribution is that we have a polynomial-time algorithm for an exact schedulability checking problem and a polynomial-time algorithm to improve the system performance with better module implementation and buffer size optimization. Experimental results show that compared to the separate scheme of module select/replication and buffer size optimization, the combined optimization scheme can gain 62% area saving on average under the same performance requirements. Moreover, our heuristic can achieve 2 to 3 orders of magnitude of speed-up in runtime, with less than 10% area overhead compared to the optimal solution by ILP.