Latent fault detection in large scale services

  • Authors:
  • Moshe Gabel;Assaf Schuster;Ran-Gilad Bachrach;Nikolaj Bjorner

  • Affiliations:
  • Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel;Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel;Microsoft Research, Microsoft, Redmond, WA, USA;Microsoft Research, Microsoft, Redmond, WA, USA

  • Venue:
  • DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unexpected machine failures, with their resulting service outages and data loss, pose challenges to datacenter management. Existing failure detection techniques rely on domain knowledge, precious (often unavailable) training data, textual console logs, or intrusive service modifications.