Quartz: a tool for tuning parallel program performance

  • Authors:
  • Thomas E. Anderson;Edward D. Lazowska

  • Affiliations:
  • Department of Computer Science and Engineering, University of Washington, Seattle WA;Department of Computer Science and Engineering, University of Washington, Seattle WA

  • Venue:
  • SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

Initial implementations of parallel programs typically yield disappointing performance. Tuning to improve performance is thus a significant part of the parallel programming process. The effort required to tune a parallel program, and the level of performance that eventually is achieved, both depend heavily on the quality of the instrumentation that is available to the programmer.This paper describes Quartz, a new tool for tuning parallel program performance on shared memory multiprocessors. The philosophy underlying Quartz was inspired by that of the sequential UNIX tool gprof: to appropriately direct the attention of the programmer by efficiently measuring just those factors that are most responsible for performance and by relating these metrics to one another and to the structure of the program. This philosophy is even more important in the parallel domain than in the sequential domain, because of the dramatically greater number of possible metrics and the dramatically increased complexity of program structures.The principal metric of Quartz is normalized processor time: the total processor time spent in each section of code divided by the number of other processors that are concurrently busy when that section of code is being executed. Tied to the logical structure of the program, this metric provides a “smoking gun” pointing towards those areas of the program most responsible for poor performance. This information can be acquired efficiently by checkpointing to memory the number of busy processors and the state of each processor, and then statistically sampling these using a dedicated processor.In addition to describing the design rationale, functionality, and implementation of Quartz, the paper examines how Quartz would be used to solve a number of performance problems that have been reported as being frequently encountered, and describes a case study in which Quartz was used to significantly improve the performance of a CAD circuit verifier.