Solving problems on concurrent processors
Solving problems on concurrent processors
Analysis of interprocedural side effects in a parallel programming environment
Proceedings of the 1st International Conference on Supercomputing
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Updating distributed variables in local computations
Concurrency: Practice and Experience
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
CPU performance evaluation and execution time prediction using narrow spectrum benchmarking
CPU performance evaluation and execution time prediction using narrow spectrum benchmarking
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimizing communication while preserving parallelism
ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient distribution analysis via graph contraction
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic data distribution with control flow analysis
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Interpreting the performance of HPF/Fortran 90D
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Requirements for Data-Parallel Programming Environments
IEEE Parallel & Distributed Technology: Systems & Technology
Cost-Effective Parallel Computing
Computer
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Performance Characterization of Optimizing Compilers
IEEE Transactions on Software Engineering
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Array Distribution in Data-Parallel Programs
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Solving Alignment Using Elementary Linear Algebra
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Automatic data and computation decomposition for distributed memory machines
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers
Automatic computation and data decomposition for multiprocessors
Automatic computation and data decomposition for multiprocessors
Compiling Efficient Programs for Tightly-Coupled Distributed Memory Computers
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Hi-index | 0.00 |
The proliferation of parallel platforms over the last ten years has been dramatic. Parallel platforms come in different flavors, including desk-top multiprocessor PCs and workstations with a few processors, networks of PCs and workstations, and supercomputers with hundreds of processors or more. This diverse collection of parallel platforms provide not only computing cycles, but other important resources for scientific computing as well, such as large amounts of main memory and fast I/O capabilities. As a result of the proliferation of parallel platforms, the "typical profile" of a potential user of such systems has changed considerably. The specialist user who has a good understanding of the complexities of the target parallel system has been replaced by a user who is largely unfamiliar with the underlying system characteristics. While the specialist's main concern is peak performance, the non-specialist user may be willing to trade off performance for ease of programming. Recent languages such as High Performance Fortran (HPF) and SGI Parallel Fortran are a significant step towards making parallel platforms truly usable for a broadening user community. However, non-trivial user input is required to produce efficient parallel programs. The main challenge for a user is to understand the performance implications of a specified data layout, which requires knowledge about issues such as code generation and analysis strategies of the HPF compiler and its node compiler, and the performance characteristics of the target architecture. This paper discusses our preliminary experiences with the design and implementation of Fortran RED, a tool that supports Fortran as a deterministic, sequential programming model on different parallel target systems. The tool is not part of a compiler. Fortran RED uses HPF as its intermediate program representation since the language is portable across many parallel platforms, and commercial and research HPF compilers are widely available. Fortran RED is able to support different target HPF compilers and target architectures, and allows multi-dimensional distributions in addition to dynamic remapping. This paper focuses on the discussion of the performance prediction component of the tool and reports preliminary results for a single scientific kernel on two target systems, namely PGI's and IBM's HPF compilers with IBM's SP-2 as the target architecture.