A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Designing fast LTL model checking algorithms for many-core GPUs
Journal of Parallel and Distributed Computing
Parallel algorithm for landform attributes representation on multicore and Multi-GPU systems
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
On parallel software verification using boolean equation systems
SPIN'12 Proceedings of the 19th international conference on Model Checking Software
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
On fast parallel detection of strongly connected components (SCC) in small-world graphs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient decomposition of strongly connected components on GPUs
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.01 |
The problem of decomposing a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. In this paper we show how some of the existing parallel algorithms can be reformulated in order to be accelerated by NVIDIA CUDA technology. In particular, we design a new CUDA-aware procedure for pivot selection and we adapt selected parallel algorithms for CUDA accelerated computation. We also experimentally demonstrate that with a single GTX 480 GPU card we can easily outperform the optimal serial CPU implementation by an order of magnitude in most cases, 40 times on some sufficiently big instances. This is an interesting result as unlike the serial CPU case, the asymptotic complexity of the parallel algorithms is not optimal.