Compiler-Directed performance model construction for parallel programs

Authors:
Martin Schindewolf;David Kramer;Marcelo Cintra
Affiliations:
Institute of Computer Science & Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Institute of Computer Science & Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
Venue:
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Year:
2010

Citing 13
Cited 0

LogP: a practical model of parallel computation

Communications of the ACM
LoPC: modeling contention in parallel algorithms

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Poems: end-to-end performance design of large parallel adaptive computational systems

Proceedings of the 1st international workshop on Software and performance
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation

LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Performance Modeling of Emerging HPC Architectures

HPCMP-UGC '06 Proceedings of the HPCMP Users Group Conference
Efficiency trends and limits from comprehensive microarchitectural adaptivity

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
CPR: Composable performance regression for scalable multiprocessor models

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A performance prediction framework for scientific applications

Future Generation Computer Systems
A framework to develop symbolic performance models of parallel applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An approach to performance prediction for parallel applications

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the last decade, performance prediction for industrial and scientific workloads on massively parallel high-performance computing systems has been and still is an active research area. Due to the complexity of applications, the approach to deriving an analytical performance model from current workloads becomes increasingly challenging: automatically generated models often suffer from inaccurate performance prediction; manually constructed analytical models show better prediction, but are very labor-intensive. Our approach aims at closing the gap between compiler-supported automatic model construction and the manual analytical modeling of workloads. Commonly, performance-counter values are used to validate the model, so that prediction errors can be determined and quantified. Instead of manually instrumenting the executable for accessing performance counters, we modified the GCC compiler to insert calls to run-time system functions. Added compiler options enable the user to control the instrumentation process. Subsequently, the instrumentation focuses on frequently executed code parts. Similar to established frameworks, a run-time system is used to track the application behavior: traces are generated at run-time, enabling the construction of architecture independent models (using quadratic programming) and, thus, the prediction of larger workloads. In this paper, we introduce our framework and demonstrate its applicability to benchmarks as well as real world numerical workloads. The experiments reveal an average error rate of 9% for the prediction of larger workloads.