SAGE: self-tuning approximation for graphics engines

Authors:
Mehrzad Samadi;Janghaeng Lee;D. Anoushe Jamshidi;Amir Hormati;Scott Mahlke
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;Google Inc., Seattle, WA;University of Michigan, Ann Arbor, MI
Venue:
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2013

Citing 20
Cited 1

Probabilistic accuracy bounds for fault-tolerant computations that discard tasks

Proceedings of the 20th annual international conference on Supercomputing
Using early phase termination to eliminate load imbalances at barrier synchronization points

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Application-Level Correctness and its Impact on Fault Tolerance

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Fast support vector machine training and classification on graphics processors

Proceedings of the 25th international conference on Machine learning
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Languages and Compilers for Parallel Computing
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Green: a framework for supporting energy-conscious programming using controlled approximation

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Quality of service profiling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Relax: an architectural framework for software recovery of hardware faults

Proceedings of the 37th annual international symposium on Computer architecture
Speeding up K-Means Algorithm by GPUs

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Dynamic knobs for responsive power-aware computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
EnerJ: approximate data types for safe and general low-power computation

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Managing performance vs. accuracy trade-offs with loop perforation

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Architecture support for disciplined approximate programming

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Language and compiler support for auto-tuning variable-accuracy algorithms

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms

IEEE Transactions on Image Processing
Branch and data herding: reducing control and memory divergence for error-tolerant GPU applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Neural Acceleration for General-Purpose Approximate Programs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Paraprox: pattern-based approximation for data parallel applications

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate computing, where computation accuracy is traded off for better performance or higher data throughput, is one solution that can help data processing keep pace with the current and growing overabundance of information. For particular domains such as multimedia and learning algorithms, approximation is commonly used today. We consider automation to be essential to provide transparent approximation and we show that larger benefits can be achieved by constructing the approximation techniques to fit the underlying hardware. Our target platform is the GPU because of its high performance capabilities and difficult programming challenges that can be alleviated with proper automation. Our approach, SAGE, combines a static compiler that automatically generates a set of CUDA kernels with varying levels of approximation with a run-time system that iteratively selects among the available kernels to achieve speedup while adhering to a target output quality set by the user. The SAGE compiler employs three optimization techniques to generate approximate kernels that exploit the GPU microarchitecture: selective discarding of atomic operations, data packing, and thread fusion. Across a set of machine learning and image processing kernels, SAGE's approximation yields an average of 2.5x speedup with less than 10% quality loss compared to the accurate execution on a NVIDIA GTX 560 GPU.