Fine-grained dynamic instrumentation of commodity operating system kernels
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Managing performance analysis with dynamic statistical projection pursuit
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A model and tools for supporting parallel real-time applications in Unix environments
RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
Dynamic Instrumentation of Large-Scale MPI and OpenMP Applications
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Measuring and characterizing system behavior using kernel-level event logging
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
K42: an infrastructure for operating system research
ACM SIGOPS Operating Systems Review
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
K42: building a complete operating system
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Performance monitor unit design for an AXI-based multi-core SoC platform
Proceedings of the 2007 ACM symposium on Applied computing
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Fine grained kernel logging with KLogger: experience and insights
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
K42: lessons for the OS community
ACM SIGOPS Operating Systems Review
Proceedings of the 7th international conference on Autonomic computing
Synchronization for fast and reentrant operating system kernel tracing
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Fay: extensible distributed tracing from kernels to clusters
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Fay: Extensible Distributed Tracing from Kernels to Clusters
ACM Transactions on Computer Systems (TOCS)
Experiences understanding performance in acommercial scale-out environment
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Lockless multi-core high-throughput buffering scheme for kernel tracing
ACM SIGOPS Operating Systems Review
Hi-index | 0.00 |
Programming, understanding, and tuning the performance of large multiprocessor systems is challenging. Experts have difficulty achieving good utilization for applications on large machines. The task of implementing a scalable system such as an operating system or database on large machines is even more challenging. And the importance of achieving good performance on multiprocessor machines is increasing as the number of cores per chip increases and as the size of multiprocessors increases. Crucial to achieving good performance is being able to understand the behavior of the system. We have developed an efficient, unified, and scalable tracing infrastructure that allows for correctness debugging, performance debugging, and performance monitoring of an operating system. The infrastructure allows variable-length events to be logged without locking and provides random access to the event stream. The infrastructure allows cheap and parallel logging of events by applications, libraries, servers, and the kernel. The infrastructure was designed for K42, a new open-source research kernel designed to scale near perfectly on large cache-coherent 64-bit multiprocessor systems. The techniques are generally applicable, and many of them have been integrated into the Linux Trace Toolkit. In this paper, we describe the implementation of the infrastructure, how we used the facility, e.g., analyzing lock contention, to understand and achieve K42's scalable performance, and the lessons we learned. The infrastructure has been invaluable to achieving great scalability.