A unified framework for systematic loop transformations
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A singular loop transformation framework based on non-singular matrices
International Journal of Parallel Programming
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Nonlinear array dependence analysis
Nonlinear array dependence analysis
Journal of Parallel and Distributed Computing
A unifying framework for iteration reordering transformations
A unifying framework for iteration reordering transformations
Run-time methods for parallelizing partially parallel loops
ICS '95 Proceedings of the 9th international conference on Supercomputing
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
Proceedings of the 14th international conference on Supercomputing
DyC: an expressive annotation-directed dynamic compiler for C
Theoretical Computer Science - Partial evaluation and semantics-based program manipulation
Automatic parallelization of irregular applications
Parallel Computing - special issue on parallel computing for irregular applications
Transformations for imperfectly nested loops
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Hybrid analysis: static & dynamic memory reference analysis
ICS '02 Proceedings of the 16th international conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Finding Legal Reordering Transformations Using Mappings
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Reducing the bandwidth of sparse symmetric matrices
ACM '69 Proceedings of the 1969 24th national conference
Localizing Non-Affine Array References
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Memory Hierarchy Management for Iterative Graph Structures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Parallel reductions: an application of adaptive algorithm selection
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Automatic Support for Irregular Computations in a High-Level Language
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reuse analysis of indirectly indexed arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
Phase-based adaptive recompilation in a JVM
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
An analytical model of locality-based parallel irregular reductions
Parallel Computing
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Task ordering and memory management problem for degree of parallelism estimation
COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
Safe parallel programming using dynamic dependence hints
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Mesh independent loop fusion for unstructured mesh applications
Proceedings of the 9th conference on Computing Frontiers
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards data tiling for whole programs in scratchpad memory allocation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Many important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compile-time. Prior work has developed run-time reorderings of data and computation that enhance locality in such applications.This paper presents a compile-time framework that allows the explicit composition of run-time data and iteration-reordering transformations. Our framework builds on the iteration-reordering framework of Kelly and Pugh to represent the effects of a given composition. To motivate our extension, we show that new compositions of run-time reordering transformations can result in better performance on three benchmarks.We show how to express a number of run-time data and iteration-reordering transformations that focus on improving data locality. We also describe the space of possible run-time reordering transformations and how existing transformations fit within it. Since sparse tiling techniques are included in our framework, they become more generally applicable, both to a larger class of applications, and in their composition with other reordering transformations. Finally, within the presented framework data need be remapped only once at runtime for a given composition thus exhibiting one example of overhead reductions the framework can express.