A GPU implementation of inclusion-based points-to analysis

Authors:
Mario Mendez-Lojo;Martin Burtscher;Keshav Pingali
Affiliations:
University of Texas, Austin, TX, USA;Texas State University, San Marcos, TX, USA;University of Texas, Austin, TX, USA
Venue:
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Year:
2012

Citing 30
Cited 8

Graph-Based Algorithms for Boolean Function Manipulation

IEEE Transactions on Computers
Type inference and semi-unification

LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
Guaranteed-quality mesh generation for curved surfaces

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Points-to analysis in almost linear time

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial online cycle elimination in inclusion constraint graphs

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Off-line variable substitution for scaling points-to analysis

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Ultra-fast aliasing analysis using CLA: a million lines of C code in a second

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Pointer analysis: haven't we solved this problem yet?

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Points-to analysis using BDDs

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Cloning-based context-sensitive pointer alias analysis using binary decision diagrams

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Network Analysis: Methodological Foundations (Lecture Notes in Computer Science)

Network Analysis: Methodological Foundations (Lecture Notes in Computer Science)
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
On the energy efficiency of graphics processing units for scientific computing

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scaling Java points-to analysis using SPARK

CC'03 Proceedings of the 12th international conference on Compiler construction
Accelerating large graph algorithms on the GPU using CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
On the limits of GPU acceleration

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Parallel inclusion-based points-to analysis

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
EigenCFA: accelerating flow analysis with GPUs

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synthesizing concurrent schedulers for irregular algorithms

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
The tao of parallelism in algorithms

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Computing Strongly Connected Components in Parallel on CUDA

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Parallel replication-based points-to analysis

CC'12 Proceedings of the 21st international conference on Compiler Construction
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
General transformations for GPU execution of tree traversals

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel flow-sensitive pointer analysis by graph-rewriting

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Software Transactional Memory for GPU Architectures

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Time- and space-efficient flow-sensitive points-to analysis

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) have emerged as powerful accelerators for many regular algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly irregular algorithms that operate on pointer-based data structures such as graphs. For the most part, research has focused on GPU implementations of graph analysis algorithms that do not modify the structure of the graph, such as algorithms for breadth-first search and strongly-connected components. In this paper, we describe a high-performance GPU implementation of an important graph algorithm used in compilers such as gcc and LLVM: Andersen-style inclusion-based points-to analysis. This algorithm is challenging to parallelize effectively on GPUs because it makes extensive modifications to the structure of the underlying graph and performs relatively little computation. In spite of this, our program, when executed on a 14 Streaming Multiprocessor GPU, achieves an average speedup of 7x compared to a sequential CPU implementation and outperforms a parallel implementation of the same algorithm running on 16 CPU cores. Our implementation provides general insights into how to produce high-performance GPU implementations of graph algorithms, and it highlights key differences between optimizing parallel programs for multicore CPUs and for GPUs.