WMTools - assessing parallel application memory utilisation at scale

  • Authors:
  • Oliver Perks;Simon D. Hammond;Simon J. Pennycook;Stephen A. Jarvis

  • Affiliations:
  • Performance Computing and Visualisation Department of Computer Science, University of Warwick, UK;Performance Computing and Visualisation Department of Computer Science, University of Warwick, UK;Performance Computing and Visualisation Department of Computer Science, University of Warwick, UK;Performance Computing and Visualisation Department of Computer Science, University of Warwick, UK

  • Venue:
  • EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The divergence between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The recent use of multi-core processor designs has, however, brought new problems to the design of memory architectures - as more cores are added to each successive generation of processor, equivalent improvement in memory capacity and memory sub-systems must be made if the compute components of the processor are to remain sufficiently supplied with data. These issues combined with the traditional problem of designing cache-efficient code help to ensure that memory remains an on-going challenge for application and machine designers. In this paper we present a comprehensive discussion of WMTools - a trace-based toolkit designed to support the analysis of memory allocation for parallel applications. This paper features an extended discussion of the WMTrace tracing tool presented in previous work including a revised discussion on trace-compression and several refinements to the tracing methodology to reduce overheads and improve tool scalability. The second half of this paper features a case study in which we apply WMTools to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes. This study demonstrates the use of WMTools in elucidating memory allocation issues in high-performance scientific codes.