The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Preliminary Topological Debugger for MPI Programs
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Matchmaking: Distributed Resource Management for High Throughput Computing
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Performance Evaluation of the VF Graph Matching Algorithm
ICIAP '99 Proceedings of the 10th International Conference on Image Analysis and Processing
Automatic Construction and Evaluation of Performance Skeletons
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Replicating memory behavior for performance prediction
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Performance prediction with skeletons
Cluster Computing
Preserving time in large-scale communication traces
Proceedings of the 22nd annual international conference on Supercomputing
ScalaExtrap: trace-based communication extrapolation for spmd programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatic generation of executable communication specifications from parallel applications
Proceedings of the international conference on Supercomputing
ScalaExtrap: Trace-based communication extrapolation for SPMD programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Auto-generation of communication benchmark traces
ACM SIGMETRICS Performance Evaluation Review
Elastic and scalable tracing and accurate replay of non-deterministic events
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Performance prediction is particularly challenging for dynamic environmentsthat cannot be modeled well due to reasons such as resource sharingand foreign system components. The approach to performance prediction takenin this work is based on the concept of a performance skeleton which is a shortrunning program whose execution time in any scenario reflects the estimated executiontime of the application it represents. The fundamental technical challengeaddressed in this paper is the automatic construction of performance skeletonsfor parallel MPI programs. The steps in the skeleton construction procedure are1) generation of process execution traces and conversion to a single coordinatedlogical program trace, 2) compression of the logical program trace, and 3) conversionto an executable parallel skeleton program. Results are presented to validatethe construction methodology and prediction power of performance skeletons.The execution scenarios analyzed involve network sharing, different architecturesand different MPI libraries. The emphasis is on identifying the strength and limitationsof this approach to performance prediction.