UNIX internals: the new frontiers
UNIX internals: the new frontiers
Programming with GNU software
Performance measurements for multithreaded programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
MDL: A Language And Compiler For Dynamic Program Instrumentation
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Performance measurement of dynamically compiled Java executions
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Using hardware performance monitors to understand the behavior of java applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
DITools: application-level support for dynamic extension and flexible composition
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Middleware Support for Performance Improvement of MABS Applications in the Grid Environment
Multi-Agent-Based Simulation VIII
Software—Practice & Experience
An efficient multi-level trace toolkit for multi-threaded applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
The use of threads is becoming commonplace in both sequential and parallel programs. This paper describes our design and initial experience with non-trace based performance instrumentation techniques for threaded programs. Our goal is to provide detailed performance data while maintaining control of instrumentation costs. We have extended Paradyn's dynamic instrumentation (which can instrument programs without recompiling or relinking) to handle threaded programs.Controlling instrumentation costs means efficient instrumentation code and avoiding locks in the instrumentation. Our design is based on low contention data structures. To associate performance data with individual threads, we have all threads share the same instrumentation code and assign each thread with its own private copy of performance counters or timers. The asynchrony in a threaded program poses a major challenge to dynamic instrumentation. To implement time-based metrics on a per-thread basis, we need to instrument thread context switches, which can cause instrumentation code to interleave. Interleaved instrumentation can not only corrupt performance data, but can also cause a scenario we call self-deadlock where an instrumentation code deadlocks a thread. We introduce thread-conscious locks to avoid self-deadlock, and per-thread virtual CPU timers to reduce the chance of interleaved instrumentation accessing the same performance counter or timer, and to reduce the number of expensive timer calls at thread context switches.Our initial implementation is on SPARC Solaris 2.5 and 2.6 including multiprocessor Sun UltraSPARC Enterprise machines. We tested our tool on large multithreaded applications, including the Java Virtual Machine (JVM). We show how our new techniques helped us to speed up a Java graphics native method by 42% and consequently increase by 24% the amount of work that can be done in unit time in a game applet.