Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Using Profile Information to Assist Advaced Compiler Optimization and Scheduling
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Advances in the TAU performance system
Performance analysis and grid computing
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A Productivity Centered Tools Framework for Application Performance Tuning
QEST '07 Proceedings of the Fourth International Conference on Quantitative Evaluation of Systems
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Guided performance analysis combining profile and trace tools
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Hi-index | 0.00 |
High productivity to the end user is critical in harnessing the power of high performance computing systems to solve science and engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations. Despite significant progress in language, compiler, and performance tools, tuning an application remains largely a manual task, and is done mostly by experts. In this paper we propose a holistic approach towards automated performance analysis and tuning that we expect to greatly improve the productivity of performance debugging. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance improvement is significant.