Analyzing blocking to debug performance problems on multi-core systems

Authors:
Pierre-Marc Fournier;Michel R. Dagenais
Affiliations:
École Polytechnique de Montréal, Montréal, Québec, Canada;École Polytechnique de Montréal, Montréal, Québec, Canada
Venue:
ACM SIGOPS Operating Systems Review
Year:
2010

Citing 6
Cited 0

Improving interactive performance using TIPME

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Paradyn Parallel Performance Measurement Tool

Computer
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Making the "box" transparent: system call performance as a first-class result

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core systems are rapidly becoming more prevalent. Consequently, developers frequently face performance bugs caused by unexpected interactions between parallel software components. The location of these bugs is difficult to identify with current tools. Indeed, the process exhibiting the slowness may be separated from the root cause of the problem by a blocking chain involving several other processes. This article introduces a new approach for analyzing blocking on multi-core systems and reports on its implementation in the LTTV Delay Analyzer. It enables developers to quickly understand the dependencies among processes and see how the total elapsed time is divided into its main components. The LTTV Delay Analyzer was used to analyze and rapidly correct complex performance problems, something not possible with the existing tools. The Linux Trace Toolkit, LTTng, is used for most of the instrumentation and the trace recording, allowing the tracing of production systems with great accuracy and minimal impact. This approach uses solely kernel instrumentation and does not require the instrumentation or recompilation of processes. The analysis time is linear with respect to trace size.