Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems

Authors:
Robert W. Wisniewski;Bryan Rosenburg
Affiliations:
IBM T. J. Watson Research Center;IBM T. J. Watson Research Center
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 6
Cited 17

Fine-grained dynamic instrumentation of commodity operating system kernels

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Managing performance analysis with dynamic statistical projection pursuit

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
A model and tools for supporting parallel real-time applications in Unix environments

RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
Dynamic Instrumentation of Large-Scale MPI and OpenMP Applications

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Measuring and characterizing system behavior using kernel-level event logging

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference

Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel

IBM Systems Journal
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
Multiple Page Size Modeling and Optimization

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
K42: an infrastructure for operating system research

ACM SIGOPS Operating Systems Review
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Performance and environment monitoring for continuous program optimization

IBM Journal of Research and Development
K42: building a complete operating system

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Performance monitor unit design for an AXI-based multi-core SoC platform

Proceedings of the 2007 ACM symposium on Applied computing
Dynamic instrumentation of production systems

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Fine grained kernel logging with KLogger: experience and insights

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
K42: lessons for the OS community

ACM SIGOPS Operating Systems Review
Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Proceedings of the 7th international conference on Autonomic computing
Synchronization for fast and reentrant operating system kernel tracing

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Fay: extensible distributed tracing from kernels to clusters

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Fay: Extensible Distributed Tracing from Kernels to Clusters

ACM Transactions on Computer Systems (TOCS)
Experiences understanding performance in acommercial scale-out environment

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Lockless multi-core high-throughput buffering scheme for kernel tracing

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Programming, understanding, and tuning the performance of large multiprocessor systems is challenging. Experts have difficulty achieving good utilization for applications on large machines. The task of implementing a scalable system such as an operating system or database on large machines is even more challenging. And the importance of achieving good performance on multiprocessor machines is increasing as the number of cores per chip increases and as the size of multiprocessors increases. Crucial to achieving good performance is being able to understand the behavior of the system. We have developed an efficient, unified, and scalable tracing infrastructure that allows for correctness debugging, performance debugging, and performance monitoring of an operating system. The infrastructure allows variable-length events to be logged without locking and provides random access to the event stream. The infrastructure allows cheap and parallel logging of events by applications, libraries, servers, and the kernel. The infrastructure was designed for K42, a new open-source research kernel designed to scale near perfectly on large cache-coherent 64-bit multiprocessor systems. The techniques are generally applicable, and many of them have been integrated into the Linux Trace Toolkit. In this paper, we describe the implementation of the infrastructure, how we used the facility, e.g., analyzing lock contention, to understand and achieve K42's scalable performance, and the lessons we learned. The infrastructure has been invaluable to achieving great scalability.