Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Software—Practice & Experience
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Querying very large multi-dimensional datasets in ADR
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallelization of a dynamic unstructured application using three leading paradigms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Parallel Programming with Polaris
Computer
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Improving Compiler and Run-Time Support for Adaptive Irregular Codes
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Repartitioning Unstructured Adaptive Meshes
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Parallel reductions: an application of adaptive algorithm selection
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
An analytical model of locality-based parallel irregular reductions
Parallel Computing
Deque-Free Work-Optimal Parallel STL Algorithms
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Variant-based competitive parallel execution of sequential programs
Proceedings of the 7th ACM international conference on Computing frontiers
SC'08 Proceedings of the 7th international conference on Software composition
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems
Proceedings of the 4th International Workshop on Multicore Software Engineering
Comparing machine learning approaches for context-aware composition
SC'11 Proceedings of the 10th international conference on Software composition
Optimized composition of performance-aware parallel components
Concurrency and Computation: Practice & Experience
Adaptation of legacy codes to context-aware composition using aspect-oriented programming
SC'12 Proceedings of the 11th international conference on Software Composition
Hi-index | 0.00 |
Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. In this paper, we present an adaptive algorithm selection framework which can collect and interpret the characteristics of a particular instance of parallel reduction algorithms and select the best performing one from an existing library. The framework consists of the following components: 1) an offline systematic process for characterizing the input sensitivity of parallel reduction algorithms and a method for building corresponding predictive performance models, 2) an online input characterization and algorithm selection module, and 3) a small library of parallel reduction algorithms, which represent the algorithmic choices made available at runtime. We also present one possible integration of this framework in a restructuring compiler. We validate our design experimentally and show that our framework 1) selects the most appropriate algorithms in 85 percent of the cases studied, 2) overall, delivers 98 percent of the optimal performance, 3) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and 4) adapts to the underlying machine architectures (evaluated on IBM Regatta and HP V-Class systems).