Dynamic instrumentation of threaded applications

Authors:
Zhichen Xu;Barton P. Miller;Oscar Naim
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison, WI;Computer Sciences Department, University of Wisconsin, Madison, WI;Oracle Corporation, 1000 SW Broadway, Suite 1200, Portland, OR
Venue:
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1999

Citing 5
Cited 10

UNIX internals: the new frontiers

UNIX internals: the new frontiers
Programming with GNU software

Programming with GNU software
Performance measurements for multithreaded programs

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The Paradyn Parallel Performance Measurement Tool

Computer
MDL: A Language And Compiler For Dynamic Program Instrumentation

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques

Performance measurement of dynamically compiled Java executions

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Vertical profiling: understanding the behavior of object-priented applications

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automating vertical profiling

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Dynamic instrumentation of production systems

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Using hardware performance monitors to understand the behavior of java applications

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
DITools: application-level support for dynamic extension and flexible composition

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Middleware Support for Performance Improvement of MABS Applications in the Grid Environment

Multi-Agent-Based Simulation VIII
Temporal vertical profiling

Software—Practice & Experience
An efficient multi-level trace toolkit for multi-threaded applications

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of threads is becoming commonplace in both sequential and parallel programs. This paper describes our design and initial experience with non-trace based performance instrumentation techniques for threaded programs. Our goal is to provide detailed performance data while maintaining control of instrumentation costs. We have extended Paradyn's dynamic instrumentation (which can instrument programs without recompiling or relinking) to handle threaded programs.Controlling instrumentation costs means efficient instrumentation code and avoiding locks in the instrumentation. Our design is based on low contention data structures. To associate performance data with individual threads, we have all threads share the same instrumentation code and assign each thread with its own private copy of performance counters or timers. The asynchrony in a threaded program poses a major challenge to dynamic instrumentation. To implement time-based metrics on a per-thread basis, we need to instrument thread context switches, which can cause instrumentation code to interleave. Interleaved instrumentation can not only corrupt performance data, but can also cause a scenario we call self-deadlock where an instrumentation code deadlocks a thread. We introduce thread-conscious locks to avoid self-deadlock, and per-thread virtual CPU timers to reduce the chance of interleaved instrumentation accessing the same performance counter or timer, and to reduce the number of expensive timer calls at thread context switches.Our initial implementation is on SPARC Solaris 2.5 and 2.6 including multiprocessor Sun UltraSPARC Enterprise machines. We tested our tool on large multithreaded applications, including the Java Virtual Machine (JVM). We show how our new techniques helped us to speed up a Java graphics native method by 42% and consequently increase by 24% the amount of work that can be done in unit time in a game applet.