Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Building analytical models into an interactive performance prediction tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Optimizing communication in Superb
CONPAR 90 Proceedings of the joint international conference on Vector and parallel processing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines
ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
ICS '92 Proceedings of the 6th international conference on Supercomputing
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Performance prediction of loop constructs on multiprocessor hierarchical-memory systems
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
An Overview of the Fortran D Programming System
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiling performance models from parallel programs
ICS '94 Proceedings of the 8th international conference on Supercomputing
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic performance prediction to support cross development of parallel programs
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The importance of synchronization structure in parallel program optimization
ICS '97 Proceedings of the 11th international conference on Supercomputing
Integrated Range Comparison for Data-Parallel Compilation Systems
IEEE Transactions on Parallel and Distributed Systems
Interpreting the performance of HPF/Fortran 90D
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Requirements for Data-Parallel Programming Environments
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Prediction: A Case Study Using a Scalable Shared-Virtual-Memory Machine
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Prediction of PVM Programs
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Fortran RED - A Retargetable Environment for Automatic Data Layout
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Performance Prediction of Data-Dependent Task Parallel Programs
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Adaptive Execution of Pipelines
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Performance Prediction with Benchmaps
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Predicting the performance of parallel programs
Parallel Computing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Low-Cost Static Performance Prediction of Parallel Stochastic Task Compositions
IEEE Transactions on Parallel and Distributed Systems
$P$^$3$$T+$: A performance estimator for distributed and parallel programs
Scientific Programming
A tool for performance modeling of parallel programs
Scientific Programming
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Model oriented profiling of parallel programs
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Optimizing dataflow applications on heterogeneous environments
Cluster Computing
Hi-index | 0.00 |
This paper presents a Parameter based Performance Prediction Tool (PPPT) which is part of the Vienna Fortran Compilation System (VFCS), a compiler that automatically translates Fortran programs into message passing programs for massively parallel architectures.The PPPT is applied to an explicitly parallel program generated by the VFCS, which may contain synchronous as well as asynchronous communication and is attributed with parameters computed in a previous profiling run. It statically computes a set of optional parameters that characterize the behavior of the parallel program. This includes work distribution, the number of data transfers, the amount of data transferred, transfer times, network contention, and the number of cache misses. These parameters can be selectively determined for statements, loops, procedures, and the entire program; furthermore, their effect with respect to individual processors can be examined.The tool plays an important role in the VFCS by providing the system as well as the user with vital performance information about the program. In particular, it supports automatic data distribution generation and the intelligent selection of transformation strategies, based on properties of the algorithm and characteristics of the target architecture.The tool has been implemented. Experiments show a strong correlation between the statically computed parameters and actual measurements; furthermore it turns out that the predicted parameter values allow a realistic ranking of different program versions with respect to the actual runtime.