Lockless multi-core high-throughput buffering scheme for kernel tracing

Authors:
Mathieu Desnoyers;Michel R. Dagenais
Affiliations:
EfficiOS Inc.;Ecole Polytechnique de Montreal, Montreal, Quebec, Canada
Venue:
ACM SIGOPS Operating Systems Review
Year:
2012

Citing 7
Cited 0

Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels

Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels
Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
K42: building a complete operating system

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Dynamic instrumentation of production systems

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Synchronization for fast and reentrant operating system kernel tracing

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
User-Level Implementations of Read-Copy Update

IEEE Transactions on Parallel and Distributed Systems
Experiences understanding performance in acommercial scale-out environment

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Studying execution of concurrent real-time online systems, to identify far-reaching and hard to reproduce latency and performance problems, requires a mechanism able to cope with voluminous information extracted from execution traces. Furthermore, the workload must not be disturbed by tracing, thereby causing the problematic behavior to become unreproducible. In order to satisfy this low-disturbance constraint, we created the LTTng kernel tracer. It is designed to enable safe and race-free attachment of probes virtually anywhere in the operating system, including sites executed in non-maskable interrupt context. In addition to being reentrant with respect to all kernel execution contexts, LTTng offers good performance and scalability, mainly due to its use of per-CPU data structures, local atomic operations as main buffer synchronization primitive, and RCU (Read-Copy Update) mechanism to control tracing. Given that kernel infrastructure used by the tracer could lead to infinite recursion if traced, and typically requires non-atomic synchronization, this paper proposes an asynchronous mechanism to inform the kernel that a buffer is ready to read. This ensures that tracing sites do not require any kernel primitive, and therefore protects from infinite recursion. This paper presents the core of LTTng's buffering algorithms and measures its performance.