Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
ICS '92 Proceedings of the 6th international conference on Supercomputing
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
NAS Parallel Benchmark Results
IEEE Parallel & Distributed Technology: Systems & Technology
Predicting Performance of Parallel Computations
IEEE Transactions on Parallel and Distributed Systems
Automated Modeling of Message-Passing Programs
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Automated performance prediction of message-passing parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic performance prediction to support cross development of parallel programs
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
Performance-steered design of software architectures for embedded multicore systems
Software—Practice & Experience
Automated Scalability Analysis of Message-Passing Parallel Programs
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Contracts: Predicting and Monitoring Grid Application Behavior
GRID '01 Proceedings of the Second International Workshop on Grid Computing
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Communication Benchmarking and Performance Modelling of MPI Programs on Cluster Computers
The Journal of Supercomputing
Hi-index | 0.00 |
This paper describes our experience in modeling two significant parallel applications: ARC2D, a 2-dimensional Euler solver; and, Xtrid, a tridiagonal linear solver. Both of these models were expressed in BDL (Behavior Description language) and simulated on an iPSC/860 Hypercube modeled using Axe (Abstract eXecution Environment). BDL models consist of abstract communicating objects: blocks of sequential code are modeled by single RUN statements; all communication operations in the original code are mirrored by corresponding BDL operations in the model. Our ARC2D model was built by first profiling the program to locate the significant loops and then timing the basic blocks within those loops. Simulated completion times were (except in one case) within 8% of measured execution times. Lengthy simulations were necessary for predicting the performance of large-scale runs. For Xtrid, only the loops surrounding communications were modeled; other loops were absorbed into large sequential blocks whose complexity was estimated using statistical regression. This approach yielded a much smaller model whose computation and communication complexities were clearly manifest. Analysis of complexity allowed rapid prediction of large-scale performance without lengthy simulations! Analytically predicted speed-ups were within 7% of those predicted by simulation. Simulated completion times were within 5% of measured execution times. The second approach provides a more effective methodology for simulation-based performance-tuning.