Computing Strongly Connected Components in Parallel on CUDA

Authors:
Jiri Barnat;Petr Bauch;Lubos Brim;Milan Ceska
Affiliations:
-;-;-;-
Venue:
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Year:
2011

Citing 0
Cited 8

A GPU implementation of inclusion-based points-to analysis

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Designing fast LTL model checking algorithms for many-core GPUs

Journal of Parallel and Distributed Computing
Parallel algorithm for landform attributes representation on multicore and Multi-GPU systems

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
On parallel software verification using boolean equation systems

SPIN'12 Proceedings of the 19th international conference on Model Checking Software
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
On fast parallel detection of strongly connected components (SCC) in small-world graphs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient decomposition of strongly connected components on GPUs

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

The problem of decomposing a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. In this paper we show how some of the existing parallel algorithms can be reformulated in order to be accelerated by NVIDIA CUDA technology. In particular, we design a new CUDA-aware procedure for pivot selection and we adapt selected parallel algorithms for CUDA accelerated computation. We also experimentally demonstrate that with a single GTX 480 GPU card we can easily outperform the optimal serial CPU implementation by an order of magnitude in most cases, 40 times on some sufficiently big instances. This is an interesting result as unlike the serial CPU case, the asymptotic complexity of the parallel algorithms is not optimal.