Integrated runtime measurement summarisation and selective event tracing for scalable parallel execution performance diagnosis

  • Authors:
  • Brian J. N. Wylie;Felix Wolf;Bernd Mohr;Markus Geimer

  • Affiliations:
  • John von Neumann Institute for Computing, Forschungszentrum Jülich, Jülich, Germany;John von Neumann Institute for Computing, Forschungszentrum Jülich, Jülich, Germany and Dept. Computer Science, RWTH Aachen University, Aachen, Germany;John von Neumann Institute for Computing, Forschungszentrum Jülich, Jülich, Germany;John von Neumann Institute for Computing, Forschungszentrum Jülich, Jülich, Germany

  • Venue:
  • PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Straightforward trace collection and processing becomes increasingly challenging and ultimately impractical for more complex, long-running, highly parallel applications. Accordingly, the SCALASCA project is extending the kojak measurement system for MPI, OpenMP and partitioned global address space (pgas) parallel applications to incorporate runtime management and summarisation capabilities. This offers a more scalable and effective profile of parallel execution performance for an initial overview and to direct instrumentation and event tracing to the key functions and callpaths for comprehensive analysis. The design and re-structuring of the revised measurement system are described, highlighting the synergies possible from integrated runtime callpath summarisation and event tracing for scalable parallel execution performance diagnosis. Early results from measurements of 16,384 MPI processes on IBM BlueGene/L already demonstrate considerably improved scalability.