A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
A singular loop transformation framework based on non-singular matrices
International Journal of Parallel Programming
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
Solving Alignment Using Elementary Linear Algebra
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
A Matrix-Based Approach to the Global Locality Optimization Problem
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Efficient Parallelization using Combined Loop and Data Transformations
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Validity of Interprocedural Data Remapping
Validity of Interprocedural Data Remapping
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
IEEE Transactions on Parallel and Distributed Systems
Graph transformation and designing parallel sparse matrix algorithms beyond data dependence analysis
Scientific Programming - Distributed Computing and Applications
Fingerprint verification on medical image reporting system
Computer Methods and Programs in Biomedicine
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper is concerned with integrating global data transformations and local loop transformations in order to minimize overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation framework, a new technique to allow the static application of global data transformations, such as partitioning, to reshaped arrays is presented, eliminating the need for expensive temporary copies and hence eliminating any communication and synchronization. In addition, by integrating loop and data transformations, poor spatial locality and expensive array subscripts that may have been introduced can be eliminated. A specific optimization algorithm is derived and applied to well-known benchmarks, where it is shown to give a significant improvement in execution time over existing approaches.