EigenCFA: accelerating flow analysis with GPUs

Authors:
Tarun Prabhu;Shreyas Ramalingam;Matthew Might;Mary Hall
Affiliations:
University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA
Venue:
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Year:
2011

Citing 23
Cited 11

Elimination algorithms for data flow analysis

ACM Computing Surveys (CSUR)
Control flow analysis in scheme

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An efficient hybrid algorithm for incremental data flow analysis

POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performing data flow analysis in parallel

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Control-flow analysis of higher-order languages of taming lambda

Control-flow analysis of higher-order languages of taming lambda
Closure analysis in constraint form

ACM Transactions on Programming Languages and Systems (TOPLAS)
A program data flow analysis procedure

Communications of the ACM
A unified approach to global program optimization

POPL '73 Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

POPL '77 Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Flow Analysis of Computer Programs

Flow Analysis of Computer Programs
Systematic design of program analysis frameworks

POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Combining DAG: A Technique for Parallel Data Flow Analysis

IEEE Transactions on Parallel and Distributed Systems
Region Analysis: A Parallel Elimination Method for Data Flow Analysis

IEEE Transactions on Software Engineering
Flow Analysis of Lambda Expressions (Preliminary Version)

Proceedings of the 8th Colloquium on Automata, Languages and Programming
Improving flow analyses via ΓCFA: abstract garbage collection and counting

Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Relating complexity and precision in control flow analysis

ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Subcubic algorithms for recursive state machines

Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Deciding kCFA is complete for EXPTIME

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Resolving and exploiting the k-CFA paradox: illuminating functional vs. object-oriented program analysis

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Parallel inclusion-based points-to analysis

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Distributed and predictable software model checking

VMCAI'11 Proceedings of the 12th international conference on Verification, model checking, and abstract interpretation
A fast implementation of the octagon abstract domain on graphics hardware

SAS'07 Proceedings of the 14th international conference on Static Analysis

Distributed and predictable software model checking

VMCAI'11 Proceedings of the 12th international conference on Verification, model checking, and abstract interpretation
A GPU implementation of inclusion-based points-to analysis

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Parallelizing top-down interprocedural analyses

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Parallel replication-based points-to analysis

CC'12 Proceedings of the 21st international conference on Compiler Construction
Exact flow analysis by higher-order model checking

FLOPS'12 Proceedings of the 11th international conference on Functional and Logic Programming
GPUstore: harnessing GPU computing for storage systems in the OS kernel

Proceedings of the 5th Annual International Systems and Storage Conference
Binary reachability analysis of higher order functional programs

SAS'12 Proceedings of the 19th international conference on Static Analysis
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable and incremental software bug detection

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Divergence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Time- and space-efficient flow-sensitive points-to analysis

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe, implement and benchmark EigenCFA, an algorithm for accelerating higher-order control-flow analysis (specifically, 0CFA) with a GPU. Ultimately, our program transformations, reductions and optimizations achieve a factor of 72 speedup over an optimized CPU implementation. We began our investigation with the view that GPUs accelerate high-arithmetic, data-parallel computations with a poor tolerance for branching. Taking that perspective to its limit, we reduced Shivers's abstract-interpretive 0CFA to an algorithm synthesized from linear-algebra operations. Central to this reduction were "abstract" Church encodings, and encodings of the syntax tree and abstract domains as vectors and matrices. A straightforward (dense-matrix) implementation of EigenCFA performed slower than a fast CPU implementation. Ultimately, sparse-matrix data structures and operations turned out to be the critical accelerants. Because control-flow graphs are sparse in practice (up to 96% empty), our control-flow matrices are also sparse, giving the sparse matrix operations an overwhelming space and speed advantage. We also achieved speedups by carefully permitting data races. The monotonicity of 0CFA makes it sound to perform analysis operations in parallel, possibly using stale or even partially-updated data.