A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Parallel algorithms for hierarchical clustering
Parallel Computing
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Advanced compiler design and implementation
Advanced compiler design and implementation
Arranging statements and data of program instances for locality
Future Generation Computer Systems - Special issue: Bio-inspired solutions to parallel processing problems
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
External memory algorithms and data structures
External memory algorithms
Regularity Considerations in Instance-Based Locality Optimization
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Exploiting non-uniform reuse for cache optimization
Proceedings of the 2001 ACM symposium on Applied computing
Hi-index | 0.00 |
Memory hierarchy-consciousness is an important requirement for the design of high-performance programs. We describe a tool that supports the programmer in restructuring performance-critical code sections. The tool works with small program instances, which are obtained by fixing program parameters such as loop bounds, and rewriting the program as an operation sequence. The tool automatically reorders the operations for better locality, and respects data dependencies. It outputs the optimized program instance in a structured form. The user finally recognizes the locality-relevant structure and generalizes it to the program. The paper focuses on recent advances in the development of our method. In particular, we introduce a hierarchical clustering scheme that highlights operation subsequences with much data reuse. The scheme is applied to the generation of structured optimized program instances in which the locality-relevant structure is easy to recognize. Experimental results are included.