Solving path problems on the GPU

Authors:
Aydın Buluç;John R. Gilbert;Ceren Budak
Affiliations:
Computer Science Department, University of California, Santa Barbara, CA 93106-5110, United States;Computer Science Department, University of California, Santa Barbara, CA 93106-5110, United States;Computer Science Department, University of California, Santa Barbara, CA 93106-5110, United States
Venue:
Parallel Computing
Year:
2010

Citing 22
Cited 5

Fibonacci heaps and their uses in improved network optimization algorithms

Journal of the ACM (JACM)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
On the all-pairs-shortest-path problem in unweighted undirected graphs

Journal of Computer and System Sciences - Special issue on selected papers presented at the 24th annual ACM symposium on the theory of computing (STOC '92)
Locality of Reference in LU Decomposition with Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
Efficient Algorithms for Shortest Paths in Sparse Networks

Journal of the ACM (JACM)
A Unified Approach to Path Problems

Journal of the ACM (JACM)
A more general algorithm for computing closed semiring costs between vertices of a directed graph

Communications of the ACM
Introduction to Automata Theory, Languages and Computability

Introduction to Automata Theory, Languages and Computability
Exact and Approximate Distances in Graphs - A Survey

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Cache-oblivious dynamic programming

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Program generation for the all-pairs shortest path problem

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks

Algorithmica
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Green Supercomputing Comes of Age

IT Professional
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Accelerating large graph algorithms on the GPU using CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing

Parallel ripple search – scalable and efficient pathfinding for multi-core architectures

MIG'11 Proceedings of the 4th international conference on Motion in Games
Multi-core pathfinding with Parallel Ripple Search

Computer Animation and Virtual Worlds
Invariants of distance k-graphs for graph embedding

Pattern Recognition Letters
Simulating large topologies in ns-3 using BRITE and CUDA driven global routing

Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques
Heterogeneous combinatorial candidate generation

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without pivoting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over this class of algorithms. The impressive computational power and memory bandwidth of the GPU make it an attractive platform to run such computationally intensive algorithms. Although improvements over CPU implementations have previously been achieved for those algorithms in terms of raw speed, the utilization of the underlying computational resources was quite low. We implemented a recursively partitioned all-pairs shortest-paths algorithm that harnesses the power of GPUs better than existing implementations. The alternate schedule of path computations allowed us to cast almost all operations into matrix-matrix multiplications on a semiring. Since matrix-matrix multiplication is highly optimized and has a high ratio of computation to communication, our implementation does not suffer from the premature saturation of bandwidth resources as iterative algorithms do. By increasing temporal locality, our implementation runs more than two orders of magnitude faster on an NVIDIA 8800 GPU than on an Opteron. Our work provides evidence that programmers should rethink algorithms instead of directly porting them to GPU.