Fundamentals of speech recognition
Fundamentals of speech recognition
Formulation and evaluation of scheduling techniques for control flow graphs
EURO-DAC '95/EURO-VHDL '95 Proceedings of the conference on European design automation
SUSAN—A New Approach to Low Level Image Processing
International Journal of Computer Vision
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A new strategy for multiprocessor scheduling of cyclic task graphs
International Journal of High Performance Computing and Networking
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
A technique for parallelising multiple loops in a heterogeneous computing system is presented. Loops are first unrolled and then broken up intomultiple tasks which are mapped to reconfigurable hardware. A performance-driven optimisation is applied to find the best unrolling factor for each loop under hardware size constraints. The approach is demonstrated using three applications: speech recognition, image processing, and the N-Body problem. Experimental results show that a maximum speedup of 34 is achieved on a 274MHz FPGA for the N-Body over a 2.6GHz microprocessor, which is 4.1 times higher than that of an approach without unrolling.