Behavior-based problem localization for parallel file systems

  • Authors:
  • Michael P. Kasick;Rajeev Gandhi;Priya Narasimhan

  • Affiliations:
  • Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA;Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA;Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a behavior-based problem-diagnosis approach for PVFS that analyzes a novel source of instrumentation--CPU instruction-pointer samples and function-call traces--to localize the faulty server and to enable root-cause analysis of the resource at fault. We validate our approach by injecting realistic storage and network problems into three different workloads (dd, IO-zone, and PostMark) on a PVFS cluster.