LogP: a practical model of parallel computation
Communications of the ACM
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Poems: end-to-end performance design of large parallel adaptive computational systems
Proceedings of the 1st international workshop on Software and performance
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Performance Modeling of Emerging HPC Architectures
HPCMP-UGC '06 Proceedings of the HPCMP Users Group Conference
Efficiency trends and limits from comprehensive microarchitectural adaptivity
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
CPR: Composable performance regression for scalable multiprocessor models
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A performance prediction framework for scientific applications
Future Generation Computer Systems
A framework to develop symbolic performance models of parallel applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An approach to performance prediction for parallel applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
During the last decade, performance prediction for industrial and scientific workloads on massively parallel high-performance computing systems has been and still is an active research area. Due to the complexity of applications, the approach to deriving an analytical performance model from current workloads becomes increasingly challenging: automatically generated models often suffer from inaccurate performance prediction; manually constructed analytical models show better prediction, but are very labor-intensive. Our approach aims at closing the gap between compiler-supported automatic model construction and the manual analytical modeling of workloads. Commonly, performance-counter values are used to validate the model, so that prediction errors can be determined and quantified. Instead of manually instrumenting the executable for accessing performance counters, we modified the GCC compiler to insert calls to run-time system functions. Added compiler options enable the user to control the instrumentation process. Subsequently, the instrumentation focuses on frequently executed code parts. Similar to established frameworks, a run-time system is used to track the application behavior: traces are generated at run-time, enabling the construction of architecture independent models (using quadratic programming) and, thus, the prediction of larger workloads. In this paper, we introduce our framework and demonstrate its applicability to benchmarks as well as real world numerical workloads. The experiments reveal an average error rate of 9% for the prediction of larger workloads.