Graph-Based Algorithms for Boolean Function Manipulation
IEEE Transactions on Computers
Type inference and semi-unification
LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
Guaranteed-quality mesh generation for curved surfaces
SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Points-to analysis in almost linear time
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial online cycle elimination in inclusion constraint graphs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Off-line variable substitution for scaling points-to analysis
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Ultra-fast aliasing analysis using CLA: a million lines of C code in a second
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Pointer analysis: haven't we solved this problem yet?
PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Cloning-based context-sensitive pointer alias analysis using binary decision diagrams
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Network Analysis: Methodological Foundations (Lecture Notes in Computer Science)
Network Analysis: Methodological Foundations (Lecture Notes in Computer Science)
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
On the energy efficiency of graphics processing units for scientific computing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scaling Java points-to analysis using SPARK
CC'03 Proceedings of the 12th international conference on Compiler construction
Accelerating large graph algorithms on the GPU using CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
On the limits of GPU acceleration
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Parallel inclusion-based points-to analysis
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
EigenCFA: accelerating flow analysis with GPUs
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synthesizing concurrent schedulers for irregular algorithms
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
The tao of parallelism in algorithms
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Computing Strongly Connected Components in Parallel on CUDA
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Parallel replication-based points-to analysis
CC'12 Proceedings of the 21st international conference on Compiler Construction
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Atomic-free irregular computations on GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
General transformations for GPU execution of tree traversals
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel flow-sensitive pointer analysis by graph-rewriting
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Software Transactional Memory for GPU Architectures
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Time- and space-efficient flow-sensitive points-to analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Graphics Processing Units (GPUs) have emerged as powerful accelerators for many regular algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly irregular algorithms that operate on pointer-based data structures such as graphs. For the most part, research has focused on GPU implementations of graph analysis algorithms that do not modify the structure of the graph, such as algorithms for breadth-first search and strongly-connected components. In this paper, we describe a high-performance GPU implementation of an important graph algorithm used in compilers such as gcc and LLVM: Andersen-style inclusion-based points-to analysis. This algorithm is challenging to parallelize effectively on GPUs because it makes extensive modifications to the structure of the underlying graph and performs relatively little computation. In spite of this, our program, when executed on a 14 Streaming Multiprocessor GPU, achieves an average speedup of 7x compared to a sequential CPU implementation and outperforms a parallel implementation of the same algorithm running on 16 CPU cores. Our implementation provides general insights into how to produce high-performance GPU implementations of graph algorithms, and it highlights key differences between optimizing parallel programs for multicore CPUs and for GPUs.