Tight bounds on the complexity of parallel sorting
IEEE Transactions on Computers
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A survey of adaptive sorting algorithms
ACM Computing Surveys (CSUR)
Implementation of a portable nested data-parallel language
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Generalized Cannon's algorithm for parallel matrix multiplication
ICS '97 Proceedings of the 11th international conference on Supercomputing
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Best sorting algorithm for nearly sorted lists
Communications of the ACM
STL tutorial and reference guide, second edition: C++ programming with the standard template library
STL tutorial and reference guide, second edition: C++ programming with the standard template library
Columnsort lives! an efficient out-of-core sorting program
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Machine Learning
IEEE Parallel & Distributed Technology: Systems & Technology
Machine Learning
A Framework for Adaptive Sorting
SWAT '92 Proceedings of the Third Scandinavian Workshop on Algorithm Theory
Adaptive Sorting and the Information Theoretic Lower Bound
STACS '03 Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science
ARMI: an adaptive, platform independent communication library
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An Adaptive Algorithm Selection Framework
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
SmartApps: middle-ware for adaptive applications on reconfigurable platforms
ACM SIGOPS Operating Systems Review
Context-sensitive domain-independent algorithm composition and selection
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
An Adaptive Algorithm Selection Framework for Reduction Parallelization
IEEE Transactions on Parallel and Distributed Systems
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Associative Parallel Containers in STAPL
Languages and Compilers for Parallel Computing
Modeling Relations between Inputs and Dynamic Behavior for General Programs
Languages and Compilers for Parallel Computing
Runtime optimization of vector operations on large scale SMP clusters
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms
Languages and Compilers for Parallel Computing
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Perflint: A Context Sensitive Performance Advisor for C++ Programs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
STAPL: standard template adaptive parallel library
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Practical performance models of algorithms in evolutionary program induction and other domains
Artificial Intelligence
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Application Information Services for distributed computing environments
Future Generation Computer Systems
The STAPL parallel container framework
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Performance models for the Spike banded linear system solver
Scientific Programming
Comparing machine learning approaches for context-aware composition
SC'11 Proceedings of the 10th international conference on Software composition
A step towards transparent integration of input-consciousness into dynamic program optimizations
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Adaptive runtime selection of parallel schedules in the polytope model
Proceedings of the 19th High Performance Computing Symposia
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Deciding where to call performance libraries
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Metronome: operating system level performance management via self-adaptive computing
Proceedings of the 49th Annual Design Automation Conference
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
VMAD: an advanced dynamic program analysis and instrumentation framework
CC'12 Proceedings of the 21st international conference on Compiler Construction
Optimized composition of performance-aware parallel components
Concurrency and Computation: Practice & Experience
Adaptation of legacy codes to context-aware composition using aspect-oriented programming
SC'12 Proceedings of the 11th international conference on Software Composition
MCSTL: the multi-core standard template library
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Towards making autotuning mainstream
International Journal of High Performance Computing Applications
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages
ACM Transactions on Architecture and Code Optimization (TACO)
An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations
International Journal of Parallel Programming
Hi-index | 0.00 |
Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.