Processor autonomy on SIMD architectures
ICS '93 Proceedings of the 7th international conference on Supercomputing
Fast Approximation Algorithms on Maxcut, k-Coloring, and k-Color Ordering for VLSI Applications
IEEE Transactions on Computers
A fast algorithm for finding dominators in a flowgraph
ACM Transactions on Programming Languages and Systems (TOPLAS)
A simple implementation of Dijkstra's shortest path algorithm on associative parallel processors
Fundamenta Informaticae - Special issue on Concurrency specification and programming (CS&P)
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Computer Animation and Virtual Worlds - Special Issue: The Very Best Papers from CASA 2004
Glift: Generic, efficient, random-access GPU data structures
ACM Transactions on Graphics (TOG)
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Visual Simulation of Heat Shimmering and Mirage
IEEE Transactions on Visualization and Computer Graphics
All-pairs shortest-paths for large graphs on the GPU
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Using common graphics hardware for multi-agent traffic simulation with CUDA
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
CUDA Solutions for the SSSP Problem
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Real-Time Online Video Object Silhouette Extraction Using Graph Cuts on the GPU
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Taming irregular EDA applications on GPUs
Proceedings of the 2009 International Conference on Computer-Aided Design
Solving path problems on the GPU
Parallel Computing
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
Efficient fault simulation on many-core processors
Proceedings of the 47th Design Automation Conference
Comparative analysis of data mining techniques for financial data using parallel processing
Proceedings of the 7th International Conference on Frontiers of Information Technology
Computer Methods and Programs in Biomedicine
Parallel processing on NVIDIA graphics processing units using CUDA
Journal of Computing Sciences in Colleges
A small-world network model for distributed storage of semantic metadata
AusGrid '09 Proceedings of the Seventh Australasian Symposium on Grid Computing and e-Research - Volume 99
Parallel graph component labelling with GPUs and CUDA
Parallel Computing
Data-intensive document clustering on graphics processing unit (GPU) clusters
Journal of Parallel and Distributed Computing
Efficient explicit-state model checking on general purpose graphics processors
SPIN'10 Proceedings of the 17th international SPIN conference on Model checking software
Approximate Spreading Activation for Efficient Knowledge Retrieval from Large Datasets
Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Speeding up network layout and centrality measures for social computing goals
SBP'11 Proceedings of the 4th international conference on Social computing, behavioral-cultural modeling and prediction
High performance content-based matching using GPUs
Proceedings of the 5th ACM international conference on Distributed event-based system
Lessons learned from exploring the backtracking paradigm on the GPU
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Towards accelerating irregular EDA applications with GPUs
Integration, the VLSI Journal
Parallel breadth-first search on distributed memory systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Exploring the limits of GPGPU scheduling in control flow bound applications
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Low latency complex event processing on parallel hardware
Journal of Parallel and Distributed Computing
Exploring high throughput computing paradigm for global routing
Proceedings of the International Conference on Computer-Aided Design
A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
An overview of Medusa: simplified graph processing on GPUs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Heuristic acceleration of routing in transportation simulations using GPUs
Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
A task parallel algorithm for finding all-pairs shortest paths using the GPU
International Journal of High Performance Computing and Networking
GPUs as an opportunity for offloading garbage collection
Proceedings of the 2012 international symposium on Memory Management
Highly scalable graph search for the Graph500 benchmark
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Characterizing and improving the use of demand-fetched caches in GPUs
Proceedings of the 26th ACM international conference on Supercomputing
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Designing fast LTL model checking algorithms for many-core GPUs
Journal of Parallel and Distributed Computing
kNN-Borůvka-GPU: a fast and scalable MST construction from kNN graphs on GPU
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
On parallel software verification using boolean equation systems
SPIN'12 Proceedings of the 19th international conference on Model Checking Software
A yoke of oxen and a thousand chickens for heavy lifting graph processing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Invariants of distance k-graphs for graph embedding
Pattern Recognition Letters
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate weighted matching on emerging manycore and multithreaded architectures
International Journal of High Performance Computing Applications
GPU accelerated genetic clustering
SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
GPUDet: a deterministic GPU architecture
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Cluster Computing
Atomic-free irregular computations on GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
Proceedings of the 40th Annual International Symposium on Computer Architecture
Efficient breadth first search on multi-GPU systems
Journal of Parallel and Distributed Computing
Simulating large topologies in ns-3 using BRITE and CUDA driven global routing
Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques
A comparative study of parallel algorithms for the girth problem
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
The energy case for graph processing on hybrid CPU and GPU systems
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Parallel graph processing on graphics processors made easy
Proceedings of the VLDB Endowment
Mining effective parallelism from hidden coherence for GPU based path tracing
SIGGRAPH Asia 2013 Technical Briefs
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
Efficient decomposition of strongly connected components on GPUs
Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Information Propagation over Complex Networks: Performance Studies on Multi-GPU
DS-RT '13 Proceedings of the 2013 IEEE/ACM 17th International Symposium on Distributed Simulation and Real Time Applications
FALCON or how to compute measures time efficiently on dynamically evolving dense complex networks?
Journal of Biomedical Informatics
Hi-index | 0.00 |
Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.