Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
Characterizations of parallelism in applications and their use in scheduling
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
Waiting time analysis and performance visualization in Carnival
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Compile-time estimation of communication costs for data parallel programs
Journal of Parallel and Distributed Computing
Efficient Symbolic Analysis for Parallelizing Compilers and Performance Estimators
The Journal of Supercomputing
Future Generation Computer Systems - Special issue on metacomputing
Automatic Performance Prediction of Parallel Programs
Automatic Performance Prediction of Parallel Programs
Symbolic Performance Modeling of Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Estimating Cache Performance for Sequential and Data Parallel Programs
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
HPF+: High Performance Fortran for Advanced Industrial Applications
HPCN Europe 1998 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Symbolic performance prediction of scalable parallel programs
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
PerPreT - A Performance Prediction Tool for Massive Parallel Sysytems
MMB '95 Proceedings of the 8th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation: Quantitative Evaluation of Computing and Communication Systems
EARL - A Programmable and Extensible Toolkit for Analyzing Event Traces of Message Passing Programs
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
LAPACK Working Note 80: The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Adaptive Performance Prediction for Distributed Data-Intensive Applications
Adaptive Performance Prediction for Distributed Data-Intensive Applications
Dynamic processor partitioning for multiprogrammed multiprocessor systems
Dynamic processor partitioning for multiprogrammed multiprocessor systems
VFC: The Vienna Fortran Compiler
Scientific Programming
On Performance Modeling for HPF Applications with ASL
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Community Databank for Performance Tracefiles
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Modeling master/worker applications for automatic performance tuning
Parallel Computing - Algorithmic skeletons
The Tracefile Testbed: a community repository for identifying and retrieving HPC performance data
International Journal of High Performance Computing and Networking
Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization
Modeling pipeline applications in POETRIES
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce $P^3T+$ which is a performance estimator for mostly regular HPF (High Performance Fortran) programs but partially covers also message passing programs (MPI). $P^3T+$ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. $P^3T+$ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC) to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of $P^3T+$.