Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A systolic algorithm for QSVD updating
Signal Processing - Theme issue on singular value decomposition
A study of the cyclic scheduling problem on parallel processors
Discrete Applied Mathematics - Special issue: Combinatorial Optimization 1992 (CO92)
Loop Shifting for Loop Compaction
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Tiling imperfectly-nested loop nests
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Efficient Pipelining of Nested Loops: Unroll-and-Squash
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Logarithmic Number System and Floating-Point Arithmetics on FPGA
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Improving Software Pipelining With Unroll-and-Jam
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Scheduling of Iterative Algorithms on FPGA with Pipelined Arithmetic Unit
RTAS '04 Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium
Implementation of the least-squares lattice with order and forgetting factor estimation for FPGA
EURASIP Journal on Advances in Signal Processing
A truly two-dimensional systolic array FPGA implementation of QR decomposition
ACM Transactions on Embedded Computing Systems (TECS)
A cyclic scheduling problem with an undetermined number of parallel identical processors
Computational Optimization and Applications
Hi-index | 0.00 |
This paper deals with the optimization of iterative algorithms with matrix operations or nested loops for hardware implementation in Field Programmable Gate Arrays (FPGA), using Integer Linear Programming (ILP). The method is demonstrated on an implementation of the Finite Interval Constant Modulus Algorithm. It is an equalization algorithm, suitable for modern communication systems (4G and behind). For the floating-point calculations required in the algorithm, two arithmetic libraries were used in the FPGA implementation: one based on the logarithmic number system, the other using floating-point number system in the standard IEEE format. Both libraries use pipelined modules. Traditional approaches to the scheduling of nested loops lead to a relatively large code, which is unsuitable for FPGA implementation. This paper presents a new high-level synthesis methodology, which models both, iterative loops and imperfectly nested loops, by means of the system of linear inequalities. Moreover, memory access is considered as an additional resource constraint. Since the solutions of ILP formulated problems are known to be computationally intensive, an important part of the article is devoted to the reduction of the problem size.