Combined instruction and loop parallelism in array synthesis for FPGAs

  • Authors:
  • Steven Derrien;Sanjay Rajopadhye;Susmita Sur Kolay

  • Affiliations:
  • IRISA, Rennes, France;IRISA, Rennes, France;Indian Statistical Institute, Calcutta, India

  • Venue:
  • Proceedings of the 14th international symposium on Systems synthesis
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compiling perfect, uniform dependence loops to fpga based co-processors normally yields processor pe arrays where a pe executes one instance of the loop body per clock cycle. We develop a transformation framework in which the derived pe can be systematically and automatically pipelined through retiming. We use well known transformations-skewing and serialization, by which an arbitrary number of registers may be placed at the pe outputs. They are then moved into the pe data-path using standard commerecial circuit retimers. Our experiments (based on performance estimates after place-and-route) have been very encouraging. For a number of examples we have seen dramatic performance improvements: speed increases of an order of magnitude with relatively little (always less than 100%) area overhead.