A framework for adaptive algorithm selection in STAPL

Authors:
Nathan Thomas;Gabriel Tanase;Olga Tkachyshyn;Jack Perdue;Nancy M. Amato;Lawrence Rauchwerger
Affiliations:
Texas A&M University;Texas A&M University;Texas A&M University;Texas A&M University;Texas A&M University;Texas A&M University
Venue:
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2005

Citing 24
Cited 39

Tight bounds on the complexity of parallel sorting

IEEE Transactions on Computers
A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A survey of adaptive sorting algorithms

ACM Computing Surveys (CSUR)
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
High-performance sorting on networks of workstations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Generalized Cannon's algorithm for parallel matrix multiplication

ICS '97 Proceedings of the 11th international conference on Supercomputing
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Best sorting algorithm for nearly sorted lists

Communications of the ACM
STL tutorial and reference guide, second edition: C++ programming with the standard template library

STL tutorial and reference guide, second edition: C++ programming with the standard template library
Columnsort lives! an efficient out-of-core sorting program

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Machine Learning

Machine Learning
High Performance Fortran

IEEE Parallel & Distributed Technology: Systems & Technology
Induction of Decision Trees

Machine Learning
A Framework for Adaptive Sorting

SWAT '92 Proceedings of the Third Scandinavian Workshop on Algorithm Theory
Adaptive Sorting and the Information Theoretic Lower Bound

STACS '03 Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science
ARMI: an adaptive, platform independent communication library

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Dynamically Tuned Sorting Library

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An Adaptive Algorithm Selection Framework

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Statistical Models for Empirical Search-Based Performance Tuning

International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
STAPL: an adaptive, generic parallel C++ library

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

SmartApps: middle-ware for adaptive applications on reconfigurable platforms

ACM SIGOPS Operating Systems Review
Context-sensitive domain-independent algorithm composition and selection

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Online performance auditing: using hot optimizations without getting burned

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
An Adaptive Algorithm Selection Framework for Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
The STAPL pArray

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Associative Parallel Containers in STAPL

Languages and Compilers for Parallel Computing
Modeling Relations between Inputs and Dynamic Behavior for General Programs

Languages and Compilers for Parallel Computing
Runtime optimization of vector operations on large scale SMP clusters

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adaptive approaches for efficient parallel algorithms on cluster-based systems

International Journal of Grid and Utility Computing
Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms

Languages and Compilers for Parallel Computing
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Perflint: A Context Sensitive Performance Advisor for C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
STAPL: standard template adaptive parallel library

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Practical performance models of algorithms in evolutionary program induction and other domains

Artificial Intelligence
An input-centric paradigm for program dynamic optimizations

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Application Information Services for distributed computing environments

Future Generation Computer Systems
The STAPL parallel container framework

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
The STAPL pView

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Performance models for the Spike banded linear system solver

Scientific Programming
Comparing machine learning approaches for context-aware composition

SC'11 Proceedings of the 10th international conference on Software composition
A step towards transparent integration of input-consciousness into dynamic program optimizations

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Adaptive runtime selection of parallel schedules in the polytope model

Proceedings of the 19th High Performance Computing Symposia
Optimizing matrix multiplication with a classifier learning system

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Deciding where to call performance libraries

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
The STAPL plist

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Self-adaptive software meets control theory: A preliminary approach supporting reliability requirements

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Metronome: operating system level performance management via self-adaptive computing

Proceedings of the 49th Annual Design Automation Conference
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
VMAD: an advanced dynamic program analysis and instrumentation framework

CC'12 Proceedings of the 21st international conference on Compiler Construction
Optimized composition of performance-aware parallel components

Concurrency and Computation: Practice & Experience
Adaptation of legacy codes to context-aware composition using aspect-oriented programming

SC'12 Proceedings of the 11th international conference on Software Composition
Modular implementation of dynamic algorithm switching in parallel simulations

Cluster Computing
MCSTL: the multi-core standard template library

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Towards making autotuning mainstream

International Journal of High Performance Computing Applications
Models of performance of evolutionary program induction algorithms based on indicators of problem difficulty

Evolutionary Computation
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages

ACM Transactions on Architecture and Code Optimization (TACO)
An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.