A domain specific aspect language for run-time inspection
Proceedings of the seventh workshop on Domain-Specific Aspect Languages
Automated tracing and visualization of software security structure and properties
Proceedings of the Ninth International Symposium on Visualization for Cyber Security
A generic solution for agile run-time inspection middleware
Proceedings of the 12th International Middleware Conference
Performance problem diagnostics by systematic experimentation
Proceedings of the 18th international doctoral symposium on Components and architecture
Supporting swift reaction: automatically uncovering performance problems by systematic experiments
Proceedings of the 2013 International Conference on Software Engineering
Hi-index | 0.00 |
We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce a novel execution monitoring technique that dynamically injects a fragment of code, the agent, into an application process on demand. The agent inserts instrumentation ahead of the control flow within the process and propagates into other processes, following communication events, crossing host boundaries, and collecting a distributed function-level trace of the execution. Second, we present an algorithm that separates the trace into user-meaningful activities called flows. This step simplifies manual examination and enables automated analysis of the trace. Finally, we describe our automated root cause analysis technique that compares the flows to help the analyst locate an anomalous flow and identify a function in that flow that is a likely cause of the anomaly. We demonstrate the effectiveness of our techniques by diagnosing two complex problems in the Condor distributed scheduling system.