Dynamic instrumentation of threaded applications

  • Authors:
  • Zhichen Xu;Barton P. Miller;Oscar Naim

  • Affiliations:
  • Computer Sciences Department, University of Wisconsin, Madison, WI;Computer Sciences Department, University of Wisconsin, Madison, WI;Oracle Corporation, 1000 SW Broadway, Suite 1200, Portland, OR

  • Venue:
  • Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of threads is becoming commonplace in both sequential and parallel programs. This paper describes our design and initial experience with non-trace based performance instrumentation techniques for threaded programs. Our goal is to provide detailed performance data while maintaining control of instrumentation costs. We have extended Paradyn's dynamic instrumentation (which can instrument programs without recompiling or relinking) to handle threaded programs.Controlling instrumentation costs means efficient instrumentation code and avoiding locks in the instrumentation. Our design is based on low contention data structures. To associate performance data with individual threads, we have all threads share the same instrumentation code and assign each thread with its own private copy of performance counters or timers. The asynchrony in a threaded program poses a major challenge to dynamic instrumentation. To implement time-based metrics on a per-thread basis, we need to instrument thread context switches, which can cause instrumentation code to interleave. Interleaved instrumentation can not only corrupt performance data, but can also cause a scenario we call self-deadlock where an instrumentation code deadlocks a thread. We introduce thread-conscious locks to avoid self-deadlock, and per-thread virtual CPU timers to reduce the chance of interleaved instrumentation accessing the same performance counter or timer, and to reduce the number of expensive timer calls at thread context switches.Our initial implementation is on SPARC Solaris 2.5 and 2.6 including multiprocessor Sun UltraSPARC Enterprise machines. We tested our tool on large multithreaded applications, including the Java Virtual Machine (JVM). We show how our new techniques helped us to speed up a Java graphics native method by 42% and consequently increase by 24% the amount of work that can be done in unit time in a game applet.