Collaborative Fault Diagnosis in Grids through Automated Tests

  • Authors:
  • Alexandre Duarte;Francisco Brasileiro;Walfredo Cirne;Jose Alencar Filho

  • Affiliations:
  • Universidade Federal de Campina Grande, Brazil;Universidade Federal de Campina Grande, Brazil;Universidade Federal de Campina Grande, Brazil;Universidade Federal de Campina Grande, Brazil

  • Venue:
  • AINA '06 Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 01
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grids have the potential to revolutionize computing by providing ubiquitous, on demand access to computational services and resources. However, grid systems are extremely large, complex and prone to failures. A survey we've conducted reveals that fault diagnosis is still a major problem for grid users. When a failure appears at the user screen, it becomes very difficult for the user to identify whether the problem is in his application, somewhere in the grid middleware, or even lower in the fabric that comprises the grid. To overcome this problem, we argue that current grid platforms must be augmented with a collaborative diagnosis mechanism. We propose for such mechanism to use automated tests to identify the root cause of a failure and propose the appropriate fix. We also present a Java-based implementation of the proposed mechanism, which provides a simple and flexible framework that eases the development and maintenance of the automated tests.