Definitions of dependence distance
ACM Letters on Programming Languages and Systems (LOPLAS)
ACM Computing Surveys (CSUR)
Successive overrelaxation (SOR) and related methods
Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
Control Mechanism for Software Pipelining on Nested Loop
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
Pipeline Vectorization for Reconfigurable Systems
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Co-Processor Synthesis: A New Methodology for Embedded Software Acceleration
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Combining optimizations in automated low power design
Proceedings of the Conference on Design, Automation and Test in Europe
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms
Journal of Signal Processing Systems
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Most hardware compilers apply loop pipelining to increase the parallelism achieved, but pipelining is restricted to the only innermost level in a nested loop. In this work we extend and adapt an existing outer loop pipelining approach known as single dimension software pipelining to generate schedules for field-programmable gate-array (FPGA) hardware coprocessors. Each loop level in nine test loops is pipelined and the resulting schedules are implemented in VHDL and targeted to an Altera Stratix II FPGA. The results show that the fastest solution for all but one of the loops occurs when pipelining is applied one to three levels above the innermost loop. Across the nine test loops we achieve an acceleration over the innermost loop solution of up to seven times, with a mean speedup of 3.2 times. The results suggest that inclusion of outer loop pipelining in future hardware compilers may be worthwhile as it can allow significantly improved results to be achieved at the cost of a small increase in compile time.