PetaBricks: a language and compiler for algorithmic choice

Authors:
Jason Ansel;Cy Chan;Yee Lok Wong;Marek Olszewski;Qin Zhao;Alan Edelman;Saman Amarasinghe
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA, USA;Massachusetts Institute of Technology , Cambridge, MA, USA;Massachusetts Institute of Technology , Cambridge, MA, USA;Massachusetts Institute of Technology , Cambridge, MA, USA;Massachusetts Institute of Technology , Cambridge, MA, USA;Massachusetts Institute of Technology , Cambridge, MA, USA;Massachusetts Institute of Technology , Cambridge, MA, USA
Venue:
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Year:
2009

Citing 21
Cited 57

High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Applied numerical linear algebra

Applied numerical linear algebra
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
SPL: a language and compiler for DSP algorithms

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Algorithm Selection using Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ADAPT: Automated De-Coupled Adaptive Program Transformation

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Dynamically Tuned Sorting Library

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An Adaptive Algorithm Selection Framework

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Optimizing Sorting with Genetic Algorithms

Proceedings of the international symposium on Code generation and optimization
Minimizing development and maintenance costs in supporting persistently optimized BLAS

Software—Practice & Experience - Research Articles
A framework for adaptive algorithm selection in STAPL

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling FFT computation on SMP and multicore systems

Proceedings of the 21st annual international conference on Supercomputing
Structured Decomposition of Adaptive Applications

PERCOM '08 Proceedings of the 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications
Automated transformation for performance-critical kernels

LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design

Autotuning multigrid with PetaBricks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Variant-based competitive parallel execution of sequential programs

Proceedings of the 7th ACM international conference on Computing frontiers
Quality of service profiling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Proceedings of the 7th international conference on Autonomic computing
PetaBricks

XRDS: Crossroads, The ACM Magazine for Students - The Changing Face of Programming
Patterns and statistical analysis for understanding reduced resource computing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Dynamic knobs for responsive power-aware computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Smart data structures: an online machine learning approach to multicore data structures

Proceedings of the 8th ACM international conference on Autonomic computing
Probabilistic auto-tuning for architectures with complex constraints

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Adapt or become extinct!: the case for a unified framework for deployment-time optimization (position paper)

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
An efficient evolutionary algorithm for solving incrementally structured problems

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Managing performance vs. accuracy trade-offs with loop perforation

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Comparing machine learning approaches for context-aware composition

SC'11 Proceedings of the 10th international conference on Software composition
A fully empirical autotuned dense QR factorization for multicore architectures

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Probabilistically accurate program transformations

SAS'11 Proceedings of the 18th international conference on Static analysis
Two for the price of one: a model for parallel and incremental computation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Exploiting coarse-grain speculative parallelism

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Coarse-grain speculation for emerging processors

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Automatic performance programming

Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software
Adaptive runtime selection of parallel schedules in the polytope model

Proceedings of the 19th High Performance Computing Symposia
Randomized accuracy-aware program transformations for efficient approximate computations

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
BrickX: building hybrid systems for recursive computations

ACM SIGMETRICS Performance Evaluation Review
Language and compiler support for auto-tuning variable-accuracy algorithms

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Self-adaptive software meets control theory: A preliminary approach supporting reliability requirements

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Self-aware computing in the Angstrom processor

Proceedings of the 49th Annual Design Automation Conference
Metronome: operating system level performance management via self-adaptive computing

Proceedings of the 49th Annual Design Automation Conference
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Proving acceptability properties of relaxed nondeterministic approximate programs

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
VMAD: an advanced dynamic program analysis and instrumentation framework

CC'12 Proceedings of the 21st international conference on Compiler Construction
Hyperparameter tuning in bandit-based adaptive operator selection

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Optimized composition of performance-aware parallel components

Concurrency and Computation: Practice & Experience
Parallel iterative compilation: using MapReduce to speedup machine learning in compilers

Proceedings of third international workshop on MapReduce and its Applications Date
Elastic computing: A portable optimization framework for hybrid computers

Parallel Computing
Adaptation of legacy codes to context-aware composition using aspect-oriented programming

SC'12 Proceedings of the 11th international conference on Software Composition
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Siblingrivalry: online autotuning through local competitions

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A multi-objective auto-tuning framework for parallel codes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Continuous learning of compiler heuristics

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
High-level support for pipeline parallelism on many-core architectures

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Dancing with uncertainty

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Verified integrity properties for safe approximate program transformations

PEPM '13 Proceedings of the ACM SIGPLAN 2013 workshop on Partial evaluation and program manipulation
Parallel schedule synthesis for attribute grammars

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Portable performance on heterogeneous architectures

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Inferred Models for Dynamic and Sparse Hardware-Software Spaces

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Multiverse: efficiently supporting distributed high-level speculation

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Verifying quantitative reliability for programs that execute on unreliable hardware

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
CoCo: sound and adaptive replacement of java collections

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Towards making autotuning mainstream

International Journal of High Performance Computing Applications
SAGE: self-tuning approximation for graphics engines

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Post-compiler software optimization for reducing energy

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Efficient search for inputs causing high floating-point errors

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often impossible to obtain a one-size-fits-all solution for high performance algorithms when considering different choices for data distributions, parallelism, transformations, and blocking. The best solution to these choices is often tightly coupled to different architectures, problem sizes, data, and available system resources. In some cases, completely different algorithms may provide the best performance. Current compiler and programming language techniques are able to change some of these parameters, but today there is no simple way for the programmer to express or the compiler to choose different algorithms to handle different parts of the data. Existing solutions normally can handle only coarse-grained, library level selections or hand coded cutoffs between base cases and recursive cases. We present PetaBricks, a new implicitly parallel language and compiler where having multiple implementations of multiple algorithms to solve a problem is the natural way of programming. We make algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The PetaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking. Additionally, we introduce novel techniques to autotune algorithms for different convergence criteria. When choosing between various direct and iterative methods, the PetaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice.