PVM: a framework for parallel distributed computing
Concurrency: Practice and Experience
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
DyC: an expressive annotation-directed dynamic compiler for C
Theoretical Computer Science - Partial evaluation and semantics-based program manipulation
High-level adaptive program optimization with ADAPT
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Active harmony: towards automated performance tuning
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Autopilot: Adaptive Control of Distributed Applications
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Tuning High Performance Kernels through Empirical Compilation
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
A scalable auto-tuning framework for compiler optimization
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speeding up Nek5000 with autotuning and specialization
Proceedings of the 24th ACM International Conference on Supercomputing
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Experiences in using cetus for source-to-source transformations
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatically tuning parallel and parallelized programs
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Effective source-to-source outlining to support whole program empirical optimization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
We present an online lightweight auto-tuning system for shared-memory parallel programs. We employ an online adaptive tuning algorithm that is based on performance measurements, to adapt to performance variability that arises during program execution. We address the impact of synchronous vs. asynchronous interactions between the application and the tuning system, and describe an adaptive approach that benefits from the improvements provided by both options. We presented a performance study of the online tuning system, and compared it to synchronous tuning systems. Finally, AARTS is evaluated under different scenarios, showing the potential benefits of using online tuning and the ability of AARTS to exploit those benefits.