Fault-tolerant grid resource management infrastructure

  • Authors:
  • J. H. Abawajy;S. P. Dandamudi

  • Affiliations:
  • Carleton University, School of Computer Science, Ottawa, Ontario, Canada;Carleton University, School of Computer Science, Ottawa, Ontario, Canada

  • Venue:
  • Neural, Parallel & Scientific Computations - Special issue: Grid computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main motivation for existing Grid systems is to provide mechanisms for sharing and accessing large and heterogeneous collections of remote resources. This remains the primary goal even today. However, achieving large-scale distributed computing in a seamless manner on Grid computing introduces not only the problem of efficient utilization and satisfactory response time but also the problem of fault-tolerance. With the momentum gaining for the Grid computing, the ability to tolerate failures while effectively exploiting the Grid computing resources in a scalable and transparent manner must be an integral part of Grid computing infrastructure. In this paper, we present a reconfigurable multi-layered Grid infrastructure that provides faulttolerance mechanisms to ensure that a Grid client can obtain reliable services, even if the middleware service that provides the desired services may suffer from crash failures.