Genesis: A Scalable Self-Evolving Performance Management Framework for Storage Systems

  • Authors:
  • Kristal T. Pollack;Sandeep M. Uttamchandani

  • Affiliations:
  • University of California, Santa Cruz;IBM Almaden Research Center

  • Venue:
  • ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

data-center environment, the administrator needs to understand the root-cause of the issue. The growing trend of system virtualization, combined with the need to support end-to-end performance goals for enterprise applications, have made root-cause analysis a nontrivial problem - administrators are required to manually parse all hardware events, configuration modifications, and changes in access characteristics, across all tiers of the IO path from application servers to the disks. We propose a framework that assists storage administrators with root-cause analysis in distributed systems. GENESIS consists of three key modules: Abnormality Detection, Snapshot Generation, and Diagnosis. The Abnormality Detection module uses clustering algorithms to create and constantly evolve the normality models of measurable parameters in components. The Snapshot Generator is triggered by a combination of abnormality detection and policies to take compact snapshots of the system state for analysis whenever a significant change occurs. The Diagnosis module parses the snapshots and shortlists the root-cause for the administrator using knowledge about the impact of the run-time changes on IO performance. We have implemented an initial proof-of-concept of GENESIS in GPFS (a high performance distributed file-system) and validated its operation for several interesting real-world scenarios. Encouraged by the results, we are currently deploying our prototype in an existing data center environment.