Automatic mapping of nested loops to FPGAS
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
C is for circuits: capturing FPGA circuits as sequential code for portability
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
A task parallel algorithm for finding all-pairs shortest paths using the GPU
International Journal of High Performance Computing and Networking
A fast poisson solver for hybrid reconfigurable system
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Hi-index | 0.00 |
Field-Programmable Gate Arrays (FPGAs) are being employed in high performance computing systems owing to their potential to accelerate a wide variety of long-running routines. Parallel FPGA-based designs often yield a very high speedup. Applications using these designs on reconfigurable supercomputers involve software on the system managing computation on the FPGA. To extract maximum performance from an FPGA design at the application level, it becomes necessary to minimize associated data movement costs on the system. We address this hardware/software integration challenge in the context of the All-Pairs Shortest-Paths (APSP) problem in a directed graph. We employ a parallel FPGA-based design using a blocked algorithm to solve large instances of APSP. With appropriate design choices and optimizations, experimental results on the Cray XD1 show that the FPGA-based implementation sustains an application-level speedup of 15 over an optimized CPU-based implementation.