Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran 90 explained
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Java as a Language for Scientific Parallel Programming
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Automatic parallelization for symmetric shared-memory multiprocessors
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Dependence analysis for subscripted variables and its application to program transformations
Dependence analysis for subscripted variables and its application to program transformations
Hi-index | 0.00 |
Past compilers have found it challenging to implement Fortran 90 array language on symmetric shared-memory multiprocessors (SMPs) so as to match, let alone beat, the performance of comparable Fortran 77 scalar loops. This is in spite of the fact that the semantics of array language is implicitly concurrent and the semantics of scalar loops is implicitly sequential. A well known obstacle to efficient execution of array language lies in the overhead of using array temporaries to obey the fetch-before-store semantics of array language. We observe that another major obstacle to supporting array language efficiently arises from the fact that most past compilers attempted to compile and optimize each array statement in isolation. In this paper, we describe a solution for optimized compilation of Fortran 90 array language for execution on SMPs. Our solution optimizes scalarized loops and scalar loops in a common framework. Our solution also adapts past work on array temporary minimization so as to avoid degradation of parallelism and locality. This solution has been implemented in the IBM XL Fortran product compiler for SMPs. To the best of our knowledge, no other Fortran 90 compiler performs such combined optimizations of scalarized loops and scalar loops. Our preliminary experimental results indicate that the performance of Fortran 90 array language can match, and even beat, the performance of comparable scalar loops. In addition to Fortran 90 array language, the approach outlined in this paper will be relevant to similar array language extensions that might appear in Java and other programming languages in the future.