Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths

Authors:
Uday Bondhugula;Ananth Devulapalli;James Dinan;Joseph Fernando;Pete Wyckoff;Eric Stahlberg;P. Sadayappan
Affiliations:
Ohio State University;Ohio Supercomputer Center, Springfield;Ohio State University;Ohio Supercomputer Center, Springfield;Ohio Supercomputer Center, Columbus, OH;Ohio Supercomputer Center, Columbus, OH;Ohio State University
Venue:
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Year:
2006

Citing 0
Cited 4

Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
C is for circuits: capturing FPGA circuits as sequential code for portability

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
A task parallel algorithm for finding all-pairs shortest paths using the GPU

International Journal of High Performance Computing and Networking
A fast poisson solver for hybrid reconfigurable system

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Field-Programmable Gate Arrays (FPGAs) are being employed in high performance computing systems owing to their potential to accelerate a wide variety of long-running routines. Parallel FPGA-based designs often yield a very high speedup. Applications using these designs on reconfigurable supercomputers involve software on the system managing computation on the FPGA. To extract maximum performance from an FPGA design at the application level, it becomes necessary to minimize associated data movement costs on the system. We address this hardware/software integration challenge in the context of the All-Pairs Shortest-Paths (APSP) problem in a directed graph. We employ a parallel FPGA-based design using a blocked algorithm to solve large instances of APSP. With appropriate design choices and optimizations, experimental results on the Cray XD1 show that the FPGA-based implementation sustains an application-level speedup of 15 over an optimized CPU-based implementation.