An Empirical Study on Data Center System Failure Diagnosis

  • Authors:
  • Montri Wiboonrat

  • Affiliations:
  • -

  • Venue:
  • ICIMP '08 Proceedings of the 2008 The Third International Conference on Internet Monitoring and Protection
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data center downtime causes business losses over a million dollars per hour. 24x7-hour data availability is critical to numerous systems, e.g. public utilities, hospitals, and data centers. Service interruption signifies lives or deaths, higher costs, and poor service quality. This research conducted the system diagnosis of reliability assessment for Tier IV data centers (DC), employing the Failure Modes, Effects, and Criticality Analysis (FMECA) and the Reliability Block Diagram (RBD). The techniques of series-parallel, active standby, k-out-of-n, bridge, full redundancy, fault tolerant,and multiple utilities were applied in the system failure diagnosis to provide high system availability. Component reliability data were obtained from the IEEE Std. 493 Gold Books. Simulation results from data center system failure diagnos is reveal the functional steps of data center downtime and pinpoint solutions to terminate or mitigate the data center downtime. Proposed improvements on the component's inherent characteristics (CIC) and the system connectivity topology (SCT) help reduce the failure rate by 1.1706 hours in 1,000,000 hours of operation.