Compiler-Directed performance model construction for parallel programs

  • Authors:
  • Martin Schindewolf;David Kramer;Marcelo Cintra

  • Affiliations:
  • Institute of Computer Science & Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Institute of Computer Science & Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;School of Informatics, University of Edinburgh, Edinburgh, United Kingdom

  • Venue:
  • ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

During the last decade, performance prediction for industrial and scientific workloads on massively parallel high-performance computing systems has been and still is an active research area. Due to the complexity of applications, the approach to deriving an analytical performance model from current workloads becomes increasingly challenging: automatically generated models often suffer from inaccurate performance prediction; manually constructed analytical models show better prediction, but are very labor-intensive. Our approach aims at closing the gap between compiler-supported automatic model construction and the manual analytical modeling of workloads. Commonly, performance-counter values are used to validate the model, so that prediction errors can be determined and quantified. Instead of manually instrumenting the executable for accessing performance counters, we modified the GCC compiler to insert calls to run-time system functions. Added compiler options enable the user to control the instrumentation process. Subsequently, the instrumentation focuses on frequently executed code parts. Similar to established frameworks, a run-time system is used to track the application behavior: traces are generated at run-time, enabling the construction of architecture independent models (using quadratic programming) and, thus, the prediction of larger workloads. In this paper, we introduce our framework and demonstrate its applicability to benchmarks as well as real world numerical workloads. The experiments reveal an average error rate of 9% for the prediction of larger workloads.