Efficient decomposition of strongly connected components on GPUs

Authors:
Guohui Li;Zhe Zhu;Zhang Cong;Fumin Yang
Affiliations:
School of Computer Science & Technology, Huazhong University of Science & Technology, China;School of Computer Science & Technology, Huazhong University of Science & Technology, China;School of Mathematics & Computer Science, Wuhan Polytechnic University, China;School of Computer Science & Technology, Huazhong University of Science & Technology, China
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2014

Citing 14
Cited 0

On Identifying Strongly Connected Components in Parallel

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Finding strongly connected components in parallel using O(log2n) reachability queries

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Numerical Recipes 3rd Edition: The Art of Scientific Computing

Numerical Recipes 3rd Edition: The Art of Scientific Computing
Gpu gems 3

Gpu gems 3
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Parallel algorithms for finding SCCs in implicitly given graphs

FMICS'06/PDMC'06 Proceedings of the 11th international workshop, FMICS 2006 and 5th international workshop, PDMC conference on Formal methods: Applications and technology
Accelerating large graph algorithms on the GPU using CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Distributed Algorithms for SCC Decomposition

Journal of Logic and Computation
Computing Strongly Connected Components in Parallel on CUDA

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
GPU Acceleration on Embedded Devices. A Power Consumption Approach

HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The GPU (Graphics Processing Unit) has recently become one of the most power efficient processors in embedded and many other environments, and has been integrated into more and more SoCs (System on Chip). Thus modern GPUs play a very important role in power aware computing. Strongly Connected Component (SCC) decomposition is a fundamental graph algorithm which has wide applications in model checking, electronic design automation, social network analysis and other fields. GPUs have been shown to have great potential in accelerating many types of computations including graph algorithms. Recent work have demonstrated the plausibility of GPU SCC decomposition, but the implementation is inefficient due to insufficient consideration of the distinguishing GPU programming model, which leads to poor performance on irregular and sparse graphs. This paper presents a new GPU SCC decomposition algorithm that focuses on full utilization of the contemporary embedded and desktop GPU architecture. In particular, a subgraph numbering scheme is proposed to facilitate the safe and efficient management of the subgraph IDs and to serve as the basis of efficient source selection. Furthermore, we adopt a multi-source partition procedure that greatly reduces the recursion depth and use a vertex labeling approach that can highly optimize the GPU memory access. The evaluation results show that the proposed approach achieves up to 41x speedup over Tarjan's algorithm, one of the most efficient sequential SCC decomposition algorithms, and up to 3.8x speedup over the previous GPU algorithms.