Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems

  • Authors:
  • Charng-Da Lu;James Browne;Robert L. DeLeon;John Hammond;William Barth;Thomas R. Furlani;Steven M. Gallo;Matthew D. Jones;Abani K. Patra

  • Affiliations:
  • SUNY at Buffalo, Buffalo, NY;University of Texas, Austin, TX;SUNY at Buffalo, Buffalo, NY;University of Texas, Austin, TX;University of Texas, Austin, TX;SUNY at Buffalo, Buffalo, NY;SUNY at Buffalo, Buffalo, NY;SUNY at Buffalo, Buffalo, NY;SUNY at Buffalo, Buffalo, NY

  • Venue:
  • Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a methodology for comprehensive job level resource use measurement and analysis and applications of the analyses to planning for HPC systems and a case study application of the methodology to the XSEDE Ranger and Lonestar4 systems at the University of Texas. The steps in the methodology are: System-wide collection of resource use and performance statistics at the job and node levels, mapping and storage of the resultant job-wise data to a relational database which eases further implementation and transformation of data to the formats required by specific statistical and analytical algorithms. Analyses can be carried out at different levels of granularity: job, user, or system-wide basis. Measurements are based on a novel lightweight job-centric measurement tool "TACC_Stats" [1], which gathers a comprehensive set of metrics on all compute nodes. The data mapping and analysis tools will be an extension to the XDMoD project [2] for the XSEDE community. This paper also reports the preliminary results from the analysis of measured data for Texas Advanced Computing Center's Lonestar4 and Ranger supercomputers. The case studies presented indicate the level of detailed information that will be available for all resources when TACC_Stats is deployed throughout the XSEDE system. The methodology can be applied to any system that runs the TACC_Stats measurement tool.