Improving wide-area distributed system availability

  • Authors:
  • Bruno Wassermann

  • Affiliations:
  • University College London, London, UK

  • Venue:
  • Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

The Software-as-a-Service (SaaS) paradigm and corresponding service-oriented technologies have simplified the development of larger, more complex software systems that routinely span administrative and organisational boundaries. These systems inhabit a complex operating environment with numerous threats to the dependability of service compositions. These threats include many system-level failures whose causes are difficult and time-consuming to determine. It is difficult to detect vulnerabilities to these failures prior to deployment of an application into production and applications are currently not well-equipped to handle them effectively. This results in lengthy downtimes of production systems and hence low availability. The goal of this PhD is to increase the availability of such systems by eliminating as many failures as possible before deployment and by assisting administrators to diagnose their causes more efficiently. We propose a novel monitoring technique and apply failure injection techniques that target these difficult failures and enable separate administrative domains to cooperate in handling them. Furthermore, we investigate the extent to which we can equip these systems to be self-diagnosing.