Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

  • Authors:
  • Li Wang;Jingling Xue;Xuejun Yang

  • Affiliations:
  • School of Computer, National University of Defense Technology, Changsha, China 410073;School of Computer Science and Engineering, UNSW, Sydney, Australia 2052;School of Computer, National University of Defense Technology, Changsha, China 410073

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.