An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

Authors:
Xipeng Shen;Yixun Liu;Eddy Z. Zhang;Poornima Bhamidipati
Affiliations:
Computer Science Department, College of William and Mary, Williamsburg, USA;Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, USA 20892-1182;Department of Computer Science, Rutgers, The State University of New Jersey, New Brunswick, USA 08901;Capital One, Williamsburg, USA 23185
Venue:
International Journal of Parallel Programming
Year:
2013

Citing 22
Cited 0

Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient incremental run-time specialization for free

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Online feedback-directed optimization of Java

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A framework for adaptive algorithm selection in STAPL

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Program optimization space pruning for a multithreaded gpu

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A cross-input adaptive framework for GPU program optimizations

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Increasing memory miss tolerance for SIMD cores

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Exploiting statistical correlations for proactive prediction of program behaviors

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Proceedings of the 24th ACM International Conference on Supercomputing
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
An input-centric paradigm for program dynamic optimizations

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A step towards transparent integration of input-consciousness into dynamic program optimizations

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphic processing units (GPU) have become increasingly adopted for the enhancement of computing throughput. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Many recent efforts have been employing empirical search-based auto-tuners to tackle the problem, but few of them have concentrated on the influence of program inputs on the optimizations. In this paper, based on a set of CUDA and OpenCL kernels, we report some evidences on the importance for auto-tuners to adapt to program input changes, and present a framework, G-ADAPT+, to address the influence by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. G-ADAPT+ is based on source-to-source compilers, specifically, Cetus and ROSE. It supports the optimizations of both CUDA and OpenCL programs.