SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
CROPS: coordinated restructuring of programs and storage
ACM SIGSOFT Software Engineering Notes
ICS '01 Proceedings of the 15th international conference on Supercomputing
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
International Journal of Parallel Programming
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Improving Locality for Adaptive Irregular Scientific Codes
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Reducing Communication Cost for Parallelizing Irregular Scientific Codes
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Parallel techniques in irregular codes: cloth simulation as case of study
Journal of Parallel and Distributed Computing
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions
International Journal of Computational Science and Engineering
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluation of Hierarchical Mesh Reorderings
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Proceedings of the international conference on Supercomputing
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.01 |
Existing techniques can enhance the locality of arrays indexed by affine functions of induction variables. This paper presents a technique to localize non-affine array references, such as the indirect memory references common in sparse-matrix computations. Our optimization combines elements of tiling, data-centric tiling, data remapping and inspector-executor parallelization.We describe our technique, bucket tiling, which includes the tasks of permutation generation, data remapping, and loop regeneration. We show that profitability cannot generally be determined at compile-time, but requires an extension to run-time. We demonstrate our technique on three codes: integer sort, conjugate gradient, and a kernel used in simulating a beating heart. We observe speedups of 1.91 on integer sort, 1.57 on conjugate gradient, and 2.69 on the heart kernel.