Language and compiler support for auto-tuning variable-accuracy algorithms

Authors:
Jason Ansel;Yee Lok Wong;Cy Chan;Marek Olszewski;Alan Edelman;Saman Amarasinghe
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
Venue:
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2011

Citing 25
Cited 9

Fuzzy logic, neural networks, and soft computing

Communications of the ACM
High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Applied numerical linear algebra

Applied numerical linear algebra
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Algorithm Selection using Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Active harmony: towards automated performance tuning

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hybrid KLT-SVD image compression

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
An Adaptive Algorithm Selection Framework

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

International Journal of High Performance Computing Applications
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks

Proceedings of the 20th annual international conference on Supercomputing
Scheduling FFT computation on SMP and multicore systems

Proceedings of the 21st annual international conference on Supercomputing
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Using early phase termination to eliminate load imbalances at barrier synchronization points

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Computer Generation of General Size Linear Transform Libraries

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Operator Language: A Program Generation Framework for Fast Kernels

DSL '09 Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Autotuning multigrid with PetaBricks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Profile-guided composition

SC'08 Proceedings of the 7th international conference on Software composition
Green: a framework for supporting energy-conscious programming using controlled approximation

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Quality of service profiling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
PetaBricks

XRDS: Crossroads, The ACM Magazine for Students - The Changing Face of Programming
Dynamic knobs for responsive power-aware computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems

Efficiently speeding up sequential computation through the n-way programming model

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Panacea: towards holistic optimization of MapReduce applications

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hyperparameter tuning in bandit-based adaptive operator selection

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Siblingrivalry: online autotuning through local competitions

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Comparison of Decision-Making Strategies for Self-Optimization in Autonomic Computing Systems

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special Section: Extended Version of SASO 2011 Best Paper
Portable performance on heterogeneous architectures

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
CoCo: sound and adaptive replacement of java collections

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
SAGE: self-tuning approximation for graphics engines

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Paraprox: pattern-based approximation for data parallel applications

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximating ideal program outputs is a common technique for solving computationally difficult problems, for adhering to processing or timing constraints, and for performance optimization in situations where perfect precision is not necessary. To this end, programmers often use approximation algorithms, iterative methods, data resampling, and other heuristics. However, programming such variable accuracy algorithms presents difficult challenges since the optimal algorithms and parameters may change with different accuracy requirements and usage environments. This problem is further compounded when multiple variable accuracy algorithms are nested together due to the complex way that accuracy requirements can propagate across algorithms and because of the size of the set of allowable compositions. As a result, programmers often deal with this issue in an ad-hoc manner that can sometimes violate sound programming practices such as maintaining library abstractions. In this paper, we propose language extensions that expose trade-offs between time and accuracy to the compiler. The compiler performs fully automatic compile-time and installtime autotuning and analyses in order to construct optimized algorithms to achieve any given target accuracy. We present novel compiler techniques and a structured genetic tuning algorithm to search the space of candidate algorithms and accuracies in the presence of recursion and sub-calls to other variable accuracy code. These techniques benefit both the library writer, by providing an easy way to describe and search the parameter and algorithmic choice space, and the library user, by allowing high level specification of accuracy requirements which are then met automatically without the need for the user to understand any algorithm-specific parameters. Additionally, we present a new suite of benchmarks, written in our language, to examine the efficacy of our techniques. Our experimental results show that by relaxing accuracy requirements, we can easily obtain performance improvements ranging from 1.1脳 to orders of magnitude of speedup.