An approach to ordering optimizing transformations
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Sharlit—a tool for building optimizers
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
C: a language for high-level, efficient, and machine-independent dynamic code generation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Generating machine specific optimizing compilers
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reverse interpretation + mutation analysis = automatic retargeting
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Architecture-cognizant divide and conquer algorithms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
HINT: A new way to measure computer performance
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Automatic Analytical Modeling for the Estimation of Cache Misses
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
ADAPT: Automated De-Coupled Adaptive Program Transformation
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Improving the Effectiveness of Software Prefetching with Adaptive Execution
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
Portable high-performance supercomputing: high-level platform-dependent optimization
Portable high-performance supercomputing: high-level platform-dependent optimization
Cache miss equations: compiler analysis framework for tuning memory behavior
Cache miss equations: compiler analysis framework for tuning memory behavior
Guiding program transformations with modal performance models
Guiding program transformations with modal performance models
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Precise automatable analytical modeling of the cache behavior of codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
Cache behavior modelling for codes involving banded matrices
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
We consider the problem of automatically guiding program transformations for locality, despite incomplete information due to complicated program structures, changing target architectures, and lack of knowledge of the properties of the input data. Our system, the modal model of memory, uses limited static analysis and bounded runtime experimentation to produce performance formulas that can be used to make runtime locality transformation decisions. Static analysis is performed once per program to determine its memory reference properties, using modes, a small set of parameterized, kernel reference patterns. Once per architectural system, our system automatically performs a set of experiments to determine a family of kernel performance formulas. The system can use these kernel formulas to synthesize a performance formula for any program's mode tree. Finally, with program transformations represented as mappings between mode trees, the generated performance formulas can be used to guide transformation decisions.