CONPAR 90 Proceedings of the joint international conference on Vector and parallel processing
Loop partitioning for distributed memory multiprocessors as unimodular transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
The parallel execution of DO loops
Communications of the ACM
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Static analysis of upper and lower bounds on dependences and parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing Computational and Spatial Overheads in Complex Transformed Loops
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Automatic data mapping of signal processing applications
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Hi-index | 0.00 |
We present a computationally efficient method for deriving the most appropriate transformation and mapping of a nested loop for a given hierarchical parallel machine. This method is in the context of our systematic and general theory of unimodular loop transformations for the problem of iteration space partitioning [7]. Finding an optimal mapping or an optimal associated unimodular transformation is NP-complete. We present a polynomial time method for obtaining a ‘good’ transformation using a simple parameterized model of the hierarchical machine. We outline a systematic methodology for obtaining the most appropriate mapping.