Parallel ocean general circulation modeling
Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science: computational issues in nonlinear science
Agile application-aware adaptation for mobility
Proceedings of the sixteenth ACM symposium on Operating systems principles
Self-similarity in World Wide Web traffic: evidence and possible causes
IEEE/ACM Transactions on Networking (TON)
Estimating the heavy tail index from scaling properties
Methodology and Computing in Applied Probability
Performance tuning and evaluation of a parallel community climate model
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
An automatic design optimization tool and its application to computational fluid dynamics
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Convergence Properties of the Nelder--Mead Simplex Method in Low Dimensions
SIAM Journal on Optimization
Convergence of the Nelder--Mead Simplex Method to a Nonstationary Point
SIAM Journal on Optimization
Active harmony: towards automated performance tuning
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Scheduling From the Perspective of the Application
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Autopilot: Adaptive Control of Distributed Applications
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Using Information from Prior Runs to Improve Automated Tuning Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Journal of Supercomputing
Methods of inference and learning for performance modeling of parallel applications
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting parallel application performance via machine learning approaches: Research Articles
Concurrency and Computation: Practice & Experience - Parallel and Distributed Computing (EuroPar 2005)
Performance variability of highly parallel architectures
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Run-time automatic performance tuning for multicore applications
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Towards fully automatic auto-tuning: Leveraging language features of Chapel
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
In this paper, we present and evaluate a parallel algorithm for parameter tuning of parallel applications. We discuss the impact of performance variability on the accuracy and efficiency of the optimization algorithm and propose a strategy to minimize the impact of this variability. We evaluate our algorithm within the Active Harmony system, an automated online/offline tuning framework. We study its performance on three benchmark codes: PSTSWM, HPL and POP. Compared to the Nelder-Mead algorithm, our algorithm finds better configurations up to seven times faster. For POP, we were able to improve the performance of a production sized run by 59%.