High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Applied numerical linear algebra
Applied numerical linear algebra
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
High-level adaptive program optimization with ADAPT
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Algorithm Selection using Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ADAPT: Automated De-Coupled Adaptive Program Transformation
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An Adaptive Algorithm Selection Framework
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Optimizing Sorting with Genetic Algorithms
Proceedings of the international symposium on Code generation and optimization
Minimizing development and maintenance costs in supporting persistently optimized BLAS
Software—Practice & Experience - Research Articles
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling FFT computation on SMP and multicore systems
Proceedings of the 21st annual international conference on Supercomputing
Structured Decomposition of Adaptive Applications
PERCOM '08 Proceedings of the 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications
Automated transformation for performance-critical kernels
LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
Autotuning multigrid with PetaBricks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Variant-based competitive parallel execution of sequential programs
Proceedings of the 7th ACM international conference on Computing frontiers
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Proceedings of the 7th international conference on Autonomic computing
XRDS: Crossroads, The ACM Magazine for Students - The Changing Face of Programming
Patterns and statistical analysis for understanding reduced resource computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Dynamic knobs for responsive power-aware computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Smart data structures: an online machine learning approach to multicore data structures
Proceedings of the 8th ACM international conference on Autonomic computing
Probabilistic auto-tuning for architectures with complex constraints
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
An efficient evolutionary algorithm for solving incrementally structured problems
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Managing performance vs. accuracy trade-offs with loop perforation
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Comparing machine learning approaches for context-aware composition
SC'11 Proceedings of the 10th international conference on Software composition
A fully empirical autotuned dense QR factorization for multicore architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Probabilistically accurate program transformations
SAS'11 Proceedings of the 18th international conference on Static analysis
Two for the price of one: a model for parallel and incremental computation
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Exploiting coarse-grain speculative parallelism
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Coarse-grain speculation for emerging processors
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Automatic performance programming
Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software
Adaptive runtime selection of parallel schedules in the polytope model
Proceedings of the 19th High Performance Computing Symposia
Randomized accuracy-aware program transformations for efficient approximate computations
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
BrickX: building hybrid systems for recursive computations
ACM SIGMETRICS Performance Evaluation Review
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Self-aware computing in the Angstrom processor
Proceedings of the 49th Annual Design Automation Conference
Metronome: operating system level performance management via self-adaptive computing
Proceedings of the 49th Annual Design Automation Conference
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Proving acceptability properties of relaxed nondeterministic approximate programs
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
VMAD: an advanced dynamic program analysis and instrumentation framework
CC'12 Proceedings of the 21st international conference on Compiler Construction
Hyperparameter tuning in bandit-based adaptive operator selection
EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Optimized composition of performance-aware parallel components
Concurrency and Computation: Practice & Experience
Parallel iterative compilation: using MapReduce to speedup machine learning in compilers
Proceedings of third international workshop on MapReduce and its Applications Date
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
Adaptation of legacy codes to context-aware composition using aspect-oriented programming
SC'12 Proceedings of the 11th international conference on Software Composition
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Siblingrivalry: online autotuning through local competitions
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A multi-objective auto-tuning framework for parallel codes
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Continuous learning of compiler heuristics
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
High-level support for pipeline parallelism on many-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Verified integrity properties for safe approximate program transformations
PEPM '13 Proceedings of the ACM SIGPLAN 2013 workshop on Partial evaluation and program manipulation
Parallel schedule synthesis for attribute grammars
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Portable performance on heterogeneous architectures
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Inferred Models for Dynamic and Sparse Hardware-Software Spaces
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Multiverse: efficiently supporting distributed high-level speculation
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Verifying quantitative reliability for programs that execute on unreliable hardware
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
CoCo: sound and adaptive replacement of java collections
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Towards making autotuning mainstream
International Journal of High Performance Computing Applications
SAGE: self-tuning approximation for graphics engines
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Post-compiler software optimization for reducing energy
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Efficient search for inputs causing high floating-point errors
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
Hi-index | 0.00 |
It is often impossible to obtain a one-size-fits-all solution for high performance algorithms when considering different choices for data distributions, parallelism, transformations, and blocking. The best solution to these choices is often tightly coupled to different architectures, problem sizes, data, and available system resources. In some cases, completely different algorithms may provide the best performance. Current compiler and programming language techniques are able to change some of these parameters, but today there is no simple way for the programmer to express or the compiler to choose different algorithms to handle different parts of the data. Existing solutions normally can handle only coarse-grained, library level selections or hand coded cutoffs between base cases and recursive cases. We present PetaBricks, a new implicitly parallel language and compiler where having multiple implementations of multiple algorithms to solve a problem is the natural way of programming. We make algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The PetaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking. Additionally, we introduce novel techniques to autotune algorithms for different convergence criteria. When choosing between various direct and iterative methods, the PetaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice.