Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Performance-Measurement Tools in a Multiprocessor Environment
IEEE Transactions on Computers
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Generating schedules and code within a unified reordering transformation framework
Generating schedules and code within a unified reordering transformation framework
Parallelisation of the SDEM distinct element stress analysis code on the KSR-1
ICS '94 Proceedings of the 8th international conference on Supercomputing
Parallelization of a three-dimensional shallow-water estuary model on the KSR-1
Scientific Programming - On applications analysis
A Data Partitioning Algorithm for Distributed Memory Compilation
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
A hierarchical locality algorithm for NUMA compilation
PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
A comparative analysis of four parallelisation schemes
ICS '99 Proceedings of the 13th international conference on Supercomputing
FINESSE: a prototype feedback-guided performance enhancement system
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Hi-index | 0.00 |
This article critically examines current parallel programming practice and optimizing compiler development. The general strategies employed by compiler and programmer to optimize a Fortran program are described, and then illustrated for a specific case by applying them to a well-known scientific program, TRED2, using the KSR-1 as the target architecture. Extensive measurement is applied to the resulting versions of the program, which are compared with a version produced by a commercial optimizing compiler, KAP. The compiler strategy significantly outperforms KAP and does not fall far short of the performance achieved by the programmer. Following the experimental section each approach is critiqued by the other. Perceived flaws, advantages, and common ground are outlined, with an eye to improving both schemes.