Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
The optimization and parallelization of array language programs
The optimization and parallelization of array language programs
Scalar replacement in the presence of conditional control flow
Software—Practice & Experience
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
An array operation synthesis scheme to optimize Fortran 90 programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The portable parallel implementation of two novel mathematical biology algorithms in ZPL
ICS '95 Proceedings of the 9th international conference on Supercomputing
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
The role of performance models in parallel programming and languages
The role of performance models in parallel programming and languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
The Case for High-Level Parallel Programming in ZPL
IEEE Computational Science & Engineering
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Quantifying the Effects of Communication Optimizations
ICPP '97 Proceedings of the international Conference on Parallel Processing
Factor-Join: A Unique Approach to Compiling Array Languages for Parallel Machines
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
SIMPLE Performance Results in ZPL
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
ZPL's WYSIWYG Performance Model
HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
Problem space promotion and its evaluation as a technique for efficient parallel computation
ICS '99 Proceedings of the 13th international conference on Supercomputing
Accelerating APL programs with SAC
Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century
ZPL: A Machine Independent Programming Language for Parallel Computers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Eliminating redundancies in sum-of-product array computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Compilation of Vector Statements of C[] Language for Architectures with Multilevel Memory Hierarchy
Programming and Computing Software
Language Support for Pipelining Wavefront Computations
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Advanced Scalarization of Array Syntax
CC '00 Proceedings of the 9th International Conference on Compiler Construction
A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC
IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
HPF vs. SAC - A Case Study (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
OOPAL: integrating array programming in object-oriented programming
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
Scalarization using loop alignment and loop skewing
The Journal of Supercomputing
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
The design and development of ZPL
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
A binding scope analysis for generic programs on arrays
IFL'05 Proceedings of the 17th international conference on Implementation and Application of Functional Languages
Iterative collective loop fusion
CC'06 Proceedings of the 15th international conference on Compiler Construction
Optimization techniques for efficient HTA programs
Parallel Computing
Hi-index | 0.00 |
Array languages such as Fortran 90, HPF and ZPL have many benefits in simplifying array-based computations and expressing data parallelism. However, they can suffer large performance penalties because they introduce intermediate arrays---both at the source level and during the compilation process---which increase memory usage and pollute the cache. Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction. We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages. Experimental results show that our scheme typically yields runtime improvements of greater than 20% and sometimes up to 400%. In addition, it yields superior memory use when compared against commercial compilers and exhibits comparable memory use when compared with scalar languages. We also explore the interaction between these transformations and communication optimizations.