Supporting Scalable Performance Monitoring and Analysis of Parallel Programs

  • Authors:
  • Kei-Chun Li;Kang Zhang

  • Affiliations:
  • Department of Computing, Macquarie University, Sydney NSW 2109, Australia danielli@mpce.mq.edu.au;Department of Computing, Macquarie University, Sydney NSW 2109, Australia kang@mpce.mq.edu.au

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tools for performance monitoring and analysis become indispensable parts of programming environments for parallel computers. As the number of processors increases, the conventional techniques for monitoring the performance of parallel programs will produce large amounts of data in the form of event trace files. On the other hand, this wealth of information is a problem for the programmer who is forced to navigate through it, and for the tools that must store and process it. What makes this situation worse is that most of the time, a large amount of the data are irrelevant to understanding the performance of an application. In this paper, we present a new approach for collecting performance data. By tracing all the events but storing only the statistics of the performance, our approach can provide accurate and useful performance information yet require far less data to be stored. In addition, this approach also supports real-time performance monitoring.