The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Optimizing Program Locality Through CMEs and GAs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Near-optimal padding for removing conflict misses
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Evaluating iterative compilation
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Program performance optimization often involves choosing right parameters to minimize the program's runtime. Selecting optimization parameters by means of executiondriven search is guaranteed to find excellent results, for it accurately accounts for all performance components of the target platform. But the major drawback of executiondriven approach is the excessive compilation time due to thousands of runs of the original program. In this article, we propose a novel technique called program reduction transformations to reduce the cost of execution-driven optimization parameter selection. It is based on our observation to the characteristics of the scientific applications and the optimization parameter selection task. The ideal is to transform the program before it is used in execution-driven parameter selection procedure. The transformed program runs in much shorter time but preserves the parameter selection quality. This technique greatly reduces the time spent on evaluating each candidate parameter and makes execution-driven optimization parameter selection affordable. We formulate the theoretic foundation of program reduction transformation. And we find several situations where reduction transformations can be legally applied. These situations are common in scientific applications. Experiments done for two math kernels and three SPEC benchmarks show that our approach is both feasible and effective.