Stardust: tracking activity in a distributed storage system

  • Authors:
  • Eno Thereska;Brandon Salmon;John Strunk;Matthew Wachs;Michael Abd-El-Malek;Julio Lopez;Gregory R. Ganger

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Performance monitoring in most distributed systems provides minimal guidance for tuning, problem diagnosis, and decision making. Stardust is a monitoring infrastructure that replaces traditional performance counters with end-to-end traces of requests and allows for efficient querying of performance metrics. Such traces better inform key administrative performance challenges by enabling, for example, extraction of per-workload, per-resource demand information and per-workload latency graphs. This paper reports on our experience building and using end-to-end tracing as an on-line monitoring tool in a distributed storage system. Using diverse system workloads and scenarios, we show that such fine-grained tracing can be made efficient (less than 6% overhead) and is useful for on- and off-line analysis of system behavior. These experiences make a case for having other systems incorporate such an instrumentation framework.