Accelerating large graph algorithms on the GPU using CUDA

Authors:
Pawan Harish;P. J. Narayanan
Affiliations:
Center for Visual Information Technology, International Institute of Information Technology Hyderabad, India;Center for Visual Information Technology, International Institute of Information Technology Hyderabad, India
Venue:
HiPC'07 Proceedings of the 14th international conference on High performance computing
Year:
2007

Citing 11
Cited 59

Processor autonomy on SIMD architectures

ICS '93 Proceedings of the 7th international conference on Supercomputing
Fast Approximation Algorithms on Maxcut, k-Coloring, and k-Color Ordering for VLSI Applications

IEEE Transactions on Computers
A fast algorithm for finding dominators in a flowgraph

ACM Transactions on Programming Languages and Systems (TOPLAS)
A simple implementation of Dijkstra's shortest path algorithm on associative parallel processors

Fundamenta Informaticae - Special issue on Concurrency specification and programming (CS&P)
Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting: Research Articles

Computer Animation and Virtual Worlds - Special Issue: The Very Best Papers from CASA 2004
Glift: Generic, efficient, random-access GPU data structures

ACM Transactions on Graphics (TOG)
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Visual Simulation of Heat Shimmering and Mirage

IEEE Transactions on Visualization and Computer Graphics

All-pairs shortest-paths for large graphs on the GPU

Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Using common graphics hardware for multi-agent traffic simulation with CUDA

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
CUDA Solutions for the SSSP Problem

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Real-Time Online Video Object Silhouette Extraction Using Graph Cuts on the GPU

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Taming irregular EDA applications on GPUs

Proceedings of the 2009 International Conference on Computer-Aided Design
Solving path problems on the GPU

Parallel Computing
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
Efficient fault simulation on many-core processors

Proceedings of the 47th Design Automation Conference
Comparative analysis of data mining techniques for financial data using parallel processing

Proceedings of the 7th International Conference on Frontiers of Information Technology
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU

Computer Methods and Programs in Biomedicine
Parallel processing on NVIDIA graphics processing units using CUDA

Journal of Computing Sciences in Colleges
A small-world network model for distributed storage of semantic metadata

AusGrid '09 Proceedings of the Seventh Australasian Symposium on Grid Computing and e-Research - Volume 99
Parallel graph component labelling with GPUs and CUDA

Parallel Computing
Data-intensive document clustering on graphics processing unit (GPU) clusters

Journal of Parallel and Distributed Computing
Efficient explicit-state model checking on general purpose graphics processors

SPIN'10 Proceedings of the 17th international SPIN conference on Model checking software
Approximate Spreading Activation for Efficient Knowledge Retrieval from Large Datasets

Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Speeding up network layout and centrality measures for social computing goals

SBP'11 Proceedings of the 4th international conference on Social computing, behavioral-cultural modeling and prediction
High performance content-based matching using GPUs

Proceedings of the 5th ACM international conference on Distributed event-based system
Lessons learned from exploring the backtracking paradigm on the GPU

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Towards accelerating irregular EDA applications with GPUs

Integration, the VLSI Journal
Parallel breadth-first search on distributed memory systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Exploring the limits of GPGPU scheduling in control flow bound applications

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Low latency complex event processing on parallel hardware

Journal of Parallel and Distributed Computing
Exploring high throughput computing paradigm for global routing

Proceedings of the International Conference on Computer-Aided Design
A GPU implementation of inclusion-based points-to analysis

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
An overview of Medusa: simplified graph processing on GPUs

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Heuristic acceleration of routing in transportation simulations using GPUs

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
A task parallel algorithm for finding all-pairs shortest paths using the GPU

International Journal of High Performance Computing and Networking
GPUs as an opportunity for offloading garbage collection

Proceedings of the 2012 international symposium on Memory Management
Highly scalable graph search for the Graph500 benchmark

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Characterizing and improving the use of demand-fetched caches in GPUs

Proceedings of the 26th ACM international conference on Supercomputing
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures

Proceedings of the 39th Annual International Symposium on Computer Architecture
Designing fast LTL model checking algorithms for many-core GPUs

Journal of Parallel and Distributed Computing
kNN-Borůvka-GPU: a fast and scalable MST construction from kNN graphs on GPU

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
On parallel software verification using boolean equation systems

SPIN'12 Proceedings of the 19th international conference on Model Checking Software
A yoke of oxen and a thousand chickens for heavy lifting graph processing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Invariants of distance k-graphs for graph embedding

Pattern Recognition Letters
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate weighted matching on emerging manycore and multithreaded architectures

International Journal of High Performance Computing Applications
GPU accelerated genetic clustering

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
GPUDet: a deterministic GPU architecture

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Optimizing tensor contraction expressions for hybrid CPU-GPU execution

Cluster Computing
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Massive data analytics: the graph 500 on IBM Blue Gene/Q

IBM Journal of Research and Development
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation

Proceedings of the 40th Annual International Symposium on Computer Architecture
Efficient breadth first search on multi-GPU systems

Journal of Parallel and Distributed Computing
Simulating large topologies in ns-3 using BRITE and CUDA driven global routing

Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques
A comparative study of parallel algorithms for the girth problem

AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
The energy case for graph processing on hybrid CPU and GPU systems

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Parallel graph processing on graphics processors made easy

Proceedings of the VLDB Endowment
Mining effective parallelism from hidden coherence for GPU based path tracing

SIGGRAPH Asia 2013 Technical Briefs
An application-centric evaluation of OpenCL on multi-core CPUs

Parallel Computing
Efficient decomposition of strongly connected components on GPUs

Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Information Propagation over Complex Networks: Performance Studies on Multi-GPU

DS-RT '13 Proceedings of the 2013 IEEE/ACM 17th International Symposium on Distributed Simulation and Real Time Applications
FALCON or how to compute measures time efficiently on dynamically evolving dense complex networks?

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.