Theia: visual signatures for problem diagnosis in large hadoop clusters

  • Authors:
  • Elmer Garduno;Soila P. Kavulya;Jiaqi Tan;Rajeev Gandhi;Priya Narasimhan

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Diagnosing performance problems in large distributed systems can be daunting as the copious volume of monitoring information available can obscure the root-cause of the problem. Automated diagnosis tools help narrow down the possible root-causes--however, these tools are not perfect thereby motivating the need for visualization tools that allow users to explore their data and gain insight on the root-cause. In this paper we describe Theia, a visualization tool that analyzes application-level logs in a Hadoop cluster, and generates visual signatures of each job's performance. These visual signatures provide compact representations of task durations, task status, and data consumption by jobs. We demonstrate the utility of Theia on real incidents experienced by users on a production Hadoop cluster.