Hunting for problems with Artemis

  • Authors:
  • Gabriela F. Creţu-Ciocârlie;Mihai Budiu;Moises Goldszmidt

  • Affiliations:
  • Microsoft Research, Silicon Valley;Microsoft Research, Silicon Valley;Microsoft Research, Silicon Valley

  • Venue:
  • WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Artemis is a modular application designed for analyzing and troubleshooting the performance of large clusters running datacenter services. Artemis is composed of four modules: (1) distributed log collection and data extraction, (2) a database storing the extracted data, (3) an interactive visualization tool for exploring the data, and (4) a plug-in interface (and a set of sample plug-ins) allowing users to implement data analysis tools including (a) the extraction and construction of new features from the basic measurements collected, and (b) the implementation and invocation of statistical and machine learning algorithms and tools. In this paper we describe each of these components and then we illustrate the power of the plug-in architecture by presenting a case-study using Artemis to analyze a Dryad application running on a 240-machine cluster.