Semi-automated data center hotspot diagnosis

  • Authors:
  • S. McIntosh;J. O. Kephart;J. Lenchner;B. Yang;M. Feridun;M. Nidd;A. Tanner;I. Barabasi

  • Affiliations:
  • IBM Thomas J. Watson Research Center;IBM Thomas J. Watson Research Center;IBM Thomas J. Watson Research Center;IBM China Research Lab, China;IBM Zurich Research Lab, Switzerland;IBM Zurich Research Lab, Switzerland;IBM Zurich Research Lab, Switzerland;IBM Green Innovations Data Center

  • Venue:
  • Proceedings of the 7th International Conference on Network and Services Management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

An increasingly important requirement for energy-efficient data center operation is to diagnose and fix thermal anomalies that sometimes occur due to excessive workload or equipment failures. Today, the task of diagnosing thermal anomalies entails expert but tedious analysis of data collected manually from disparate management systems. Our ultimate goal is to substantially reduce the time, tedium and expertise required to diagnose thermal hotspots by developing a system that generates accurate diagnoses automatically. We describe a substantial step towards this goal: a loosely-coupled, semi-automated thermal diagnosis system that integrates IT and facilities data, uses simple heuristics to highlight the most likely culprits, and provides a graphical interface that enables an administrator to narrow the list further by exploring data correlations. Among the challenges addressed by our solution are coping with heterogeneous data types and data access methods, and detecting and managing erroneous sensor readings.