Localizing Non-Affine Array References

Authors:
Nicholas Mitchell;Larry Carter;Jeanne Ferrante
Affiliations:
-;-;-
Venue:
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Year:
1999

Citing 0
Cited 31

ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Proceedings of the 14th international conference on Supercomputing
CROPS: coordinated restructuring of programs and storage

ACM SIGSOFT Software Engineering Notes
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Data Relation Vectors: A New Abstraction for Data Optimizations

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Rescheduling for Locality in Sparse Matrix Computations

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Improving Locality for Adaptive Irregular Scientific Codes

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Reducing Communication Cost for Parallelizing Irregular Scientific Codes

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

IEEE Transactions on Computers
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Restructuring computations for temporal data cache locality

International Journal of Parallel Programming
Quasidynamic Layout Optimizations for Improving Data Locality

IEEE Transactions on Parallel and Distributed Systems
Metrics and models for reordering transformations

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Parallel techniques in irregular codes: cloth simulation as case of study

Journal of Parallel and Distributed Computing
Sparse Tiling for Stationary Iterative Methods

International Journal of High Performance Computing Applications
Improving the computational intensity of unstructured mesh applications

Proceedings of the 19th annual international conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting Locality for Irregular Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions

International Journal of Computational Science and Engineering
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluation of Hierarchical Mesh Reorderings

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies

The Journal of Supercomputing
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing
Combining performance aspects of irregular gauss-seidel via sparse tiling

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatically enhancing locality for tree traversals with traversal splicing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Code generation for parallel execution of a class of irregular loops on distributed memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Non-affine Extensions to Polyhedral Code Generation

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

Existing techniques can enhance the locality of arrays indexed by affine functions of induction variables. This paper presents a technique to localize non-affine array references, such as the indirect memory references common in sparse-matrix computations. Our optimization combines elements of tiling, data-centric tiling, data remapping and inspector-executor parallelization.We describe our technique, bucket tiling, which includes the tasks of permutation generation, data remapping, and loop regeneration. We show that profitability cannot generally be determined at compile-time, but requires an extension to run-time. We demonstrate our technique on three codes: integer sort, conjugate gradient, and a kernel used in simulating a beating heart. We observe speedups of 1.91 on integer sort, 1.57 on conjugate gradient, and 2.69 on the heart kernel.