Building analytical models into an interactive performance prediction tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook
The high performance Fortran handbook
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Writing efficient programs
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
PERFSIM: a tool for automatic performance analysis of data-parallel Fortran programs
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
NESL: A Nested Data-Parallel Language (Version 2.6)
NESL: A Nested Data-Parallel Language (Version 2.6)
LAPACK Working Note 55: ScaLAPACK: A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
Portable high-performance supercomputing: high-level platform-dependent optimization
Portable high-performance supercomputing: high-level platform-dependent optimization
Quantitative performance modeling of scientific computations and creating locality in numerical algorithms
Hi-index | 0.00 |
Benchmapping is a performance prediction method for data-parallel programs that is based on modeling the performance of runtime systems. This paper describes a benchmapping system, called BenchCvl, that predicts the running time of data-parallel programs written in the NESL language on several computer systems. BenchCvl predicts performance using a set of more than 200 parameterized models. The models quantify the cost of moving data between processors, as well as the cost of moving data within the local memory hierarchy of each processor. The parameters for the models are automatically estimated from measurements of the execution time of runtime system calls on each computer system.