Divergence analysis

Authors:
Diogo Sampaio;Rafael Martins de Souza;Sylvain Collange;Fernando Magno Quintão Pereira
Affiliations:
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;INRIA, Rennes Cedex, France;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2014

Citing 59
Cited 0

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Automatic construction of sparse data flow evaluation graphs

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Control structures for data-parallel SIMD languages: semantics and implementation

Future Generation Computer Systems - Special issue: PARLE 91
Rematerialization

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Efficient building and placing of gating functions

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Formal specification of parallel SIMD execution

Theoretical Computer Science - Special issue on theoretical computer science in Australia and New Zealand
Barrier inference

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SSA is functional programming

ACM SIGPLAN Notices
Linear scan register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Language for Array and Vector Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Glypnir—a programming language for Illiac IV

Communications of the ACM
Fast copy coalescing and live-range identification

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatic discovery of linear restraints among variables of a program

POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Program slicing

ICSE '81 Proceedings of the 5th international conference on Software engineering
Efficient Oblivious Parallel Sorting on the MasPar MP-1

HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Software Technology and Architecture - Volume 1
The octagon abstract domain

Higher-Order and Symbolic Computation
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Wavefront Array Processor: Language, Architecture, and Applications

IEEE Transactions on Computers
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
TRANQUIL: a language for an array processing computer

AFIPS '69 (Spring) Proceedings of the May 14-16, 1969, spring joint computer conference
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A control-structure splitting optimization for GPGPU

Proceedings of the 6th ACM conference on Computing frontiers
Programming model for a heterogeneous x86 platform

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
GPU-Quicksort: A practical Quicksort algorithm for graphics processors

Journal of Experimental Algorithmics (JEA)
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
User-input dependence analysis via graph reachability

User-input dependence analysis via graph reachability
Optimal register allocation for SSA-form programs in polynomial time

Information Processing Letters
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
The GPU Computing Era

IEEE Micro
A GPGPU compiler for memory optimization and parallelism management

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Proceedings of the 24th ACM International Conference on Supercomputing
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures

Communications of the ACM
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
IP routing processing with graphic processors

Proceedings of the Conference on Design, Automation and Test in Europe
Dynamic detection of uniform and affine vectors in GPGPU computations

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
EigenCFA: accelerating flow analysis with GPUs

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Some computer organizations and their effectiveness

IEEE Transactions on Computers
Reducing branch divergence in GPU programs

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Principles of Program Analysis

Principles of Program Analysis
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Proceedings of the 38th annual international symposium on Computer architecture
A quantitative performance analysis model for GPU architectures

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Divergence Analysis and Optimizations

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Extending a C-like language for portable SIMD programming

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Whole-function vectorization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
GPU programming in a high level language: compiling X10 to CUDA

Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Compiling a high-level language for GPUs: (via language support for architectures and compilers)

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Improving performance of OpenCL on CPUs

CC'12 Proceedings of the 21st international conference on Compiler Construction
Spill code placement for SIMD machines

SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages
Divergence Analysis with Affine Constraints

SBAC-PAD '12 Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing
Convergence and scalarization for data-parallel architectures

CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.