A study of scalar compilation techniques for pipelined supercomputers
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
EEL: machine-independent executable editing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Unroll-and-jam using uniformly generated sets
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimized Unrolling of Nested Loops
International Journal of Parallel Programming
Adaptive Optimizing Compilers for the 21st Century
The Journal of Supercomputing
Automatic Analytical Modeling for the Estimation of Cache Misses
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
Using Machine Learning to Focus Iterative Optimization
Proceedings of the International Symposium on Code Generation and Optimization
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
An offline approach for whole-program paths analysis using suffix arrays
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hi-index | 0.00 |
Nowadays, compilers contain a large number of optimizations. As a result code performance is becoming more and more complex to guarantee, it is sensitive to butterfly effects, and difficult to assess without extensive tuning and experimenting. The three fundamental points for code optimization are to detect, understand and fix potential performance problems. Assessing precisely the quality of compiled code is essential to deliver high performance. Nowadays this issue is mostly tackled by using hardware counters and dynamic profiling. Static analysis, as we aim to illustrate in this paper, can achieve similar results at a much lower cost and with a better accuracy. We propose a different static / dynamic combination to achieve a trade-off between code quality and performance. Our approach uses static and dynamic analysis as a means to automatically select good compiler optimization settings for each innerloop.