PDA: a tool for automated problem determination

  • Authors:
  • Hai Huang;Raymond Jennings, III;Yaoping Ruan;Ramendra Sahoo;Sambit Sahu;Anees Shaikh

  • Affiliations:
  • IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center

  • Venue:
  • LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Problem determination remains one of the most expensive and time-consuming functions in system management due to the difficulty in automating what is essentially a highly experience-dependent task. In this paper we study the characteristics of problem tickets in an enterprise IT infrastructure and observe that most of the tickets come from very few products and modules, and OS problems present higher resolving duration. We propose PDA, a problem management tool that provides automated problem diagnosis capabilities to assist system administrators in solving real-world problems more efficiently. PDA uses a two-level approach of proactive, high-level system health checks, coupled with rule-based "drill-down" probing to automatically collect detailed information related to the problem. Our tool allows system administrators to author and customize probes and rules accordingly and share across the organization. We illustrate the usage and benefits of PDA with a number of UNIX problem scenarios that show PDA is able to quickly collect key information through its rules to aid in problem determination.