Fault Monitoring and Detection of Distributed Services over Local and Wide Area Networks

  • Authors:
  • Ella Pereira;Rubem Pereira

  • Affiliations:
  • Edge Hill University College, UK;Liverpool John Moores University, UK

  • Venue:
  • ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software development has evolved to incorporate the reusability of software components, enabling developers to focus on the requirements analysis without having to fully develop every component: Existing components that provide a given functionality can be reused by various applications. A parallel development has been the availability, through the world-wide-web, of data, transactions, and communications. These developments have led to the emergence of web-services, collections of reusable code that use the Web communications paradigm for wider availability and communications between applications. In this context, with services coming and going, as well as possibly crashing, the issue of selfhealing is of great relevance: How does an application learn that a remote service has become unavailable? In this paper we consider the issue of service failure detection and replacement, paying special attention to the relationship between the time it takes to find a replacement for a service, and the frequency of failure monitoring by the application.