Monitoring MPI programs for performance characterization and management control

  • Authors:
  • Robert A. Ballance;Jonathan Cook

  • Affiliations:
  • Sandia National Laboratories, Albuquerque, NM;New Mexico State University, Las Cruces, NM

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Monitoring distributed programs on high performance supercomputers is a challenging task, yet it is essential for the proper administration of the machines and for users to understand what their program is doing on production runs. To this end, we created a flexible monitoring capability for a major class of scientific applications, programs using MPI, that efficiently gathers information from the distributed program and collects it at a central point. This data can then be used to both understand application-centric issues and system-centric issues; and for improvement, administration, and maintenance of both the complex applications producing important scientific results and the complex systems that execute them.