Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Updating distributed variables in local computations
Concurrency: Practice and Experience
Programming data parallel algorithms on distributed memory using Kali
ICS '91 Proceedings of the 5th international conference on Supercomputing
Analysis and transformation in the ParaScope editor
ICS '91 Proceedings of the 5th international conference on Supercomputing
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The DINO parallel programming language
Journal of Parallel and Distributed Computing
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Computer support for machine-independent parallel programming in Fortran D
Languages, compilers and run-time environments for distributed memory machines
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Parallelization of FORTRAN code on distributed-memory parallel processors
ICS '90 Proceedings of the 4th international conference on Supercomputing
Pandore: a system to manage data distribution
ICS '90 Proceedings of the 4th international conference on Supercomputing
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
Data-Parallel Programming on MIMD Computers
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Estimation of Communication Costs on Multicomputers
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
An Overview of the Fortran D Programming System
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Interprocedural compilation of Fortran D for MIMD distributed-memory machines
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
Compiler and runtime support for structured and block structured applications
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Unified compilation of Fortran 77D and 90D
ACM Letters on Programming Languages and Systems (LOPLAS)
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Index array flattening through program transformation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Controlling application grain size on a network of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
mpC: a multi-paradigm programming language for massively parallel computers
ACM SIGPLAN Notices
The effect of interrupts on software pipeline execution on message-passing architectures
ICS '96 Proceedings of the 10th international conference on Supercomputing
Interprocedural Partial Redundancy Elimination With Application to Distributed Memory Compilation
IEEE Transactions on Parallel and Distributed Systems
A balanced code placement framework
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimization of dynamic data distributions for distributed-memory multicomputers
Compiler optimizations for scalable parallel systems
Runtime and compiler support for irregular computations
Compiler optimizations for scalable parallel systems
Language Support for Pipelining Wavefront Computations
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Generating Realignment-Based Communication for HPF Programs
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Using cache optimizing compiler for managing software cache on distributed shared memory system
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Program Partitioning Optimizations in an HPF Prototype Compiler
COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
NUMACROS: data parallel programming on NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Optimizing the use of static buffers for DMA on a CELL chip
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Hi-index | 0.02 |
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Profitability formulas are derived for each optimization. Results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance.