Journal of the ACM (JACM)
Parallel programming in OpenMP
Parallel programming in OpenMP
Communications of the ACM
Introduction to algorithms
A Blocked All-Pairs Shortest-Path Algorithm
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
Generating association graphs of non-cooccurring text objects using transitive methods
Proceedings of the 2005 ACM symposium on Applied computing
Program generation for the all-pairs shortest path problem
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
ICPADS '07 Proceedings of the 13th International Conference on Parallel and Distributed Systems - Volume 01
All-pairs shortest-paths for large graphs on the GPU
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Integration, the VLSI Journal
Accelerating large graph algorithms on the GPU using CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Hi-index | 0.00 |
This paper proposes an acceleration method for finding the all-pairs shortest paths (APSPs) using the graphics processing unit (GPU). Our method is based on Harish's iterative algorithm that computes the cost of the single-source shortest path (SSSP) in parallel on the GPU. In addition to this fine-grained parallelism, we exploit the coarse-grained parallelism by using a task parallelisation scheme that associates a task with an SSSP problem. This scheme solves multiple SSSP problems at a time, allowing us to efficiently access graph data by sharing the data between processing elements in the GPU. Furthermore, our fine-and coarse-grained parallelisation leads to a higher parallelism, increasing the efficiency with highly threaded code. As a result, the speedup over the previous SSSP-based implementation ranges from a factor of 2.8 to that of 13, depending on the graph topology. We also show that the overhead of path recording needed after cost computation increases the execution time by 7.7%.