Extending GridSim with an architecture for failure detection

  • Authors:
  • Agustin Caminero;Anthony Sulistio;Blanca Caminero;Carmen Carrion;Rajkumar Buyya

  • Affiliations:
  • Department of Computing Systems, The University of Castilla, La Mancha, Spain;Dept. of Computer Sc.&Software Eng., The University of Melbourne, Australia;Department of Computing Systems, The University of Castilla, La Mancha, Spain;Department of Computing Systems, The University of Castilla, La Mancha, Spain;Dept. of Computer Sc.&Software Eng., The University of Melbourne, Australia

  • Venue:
  • ICPADS '07 Proceedings of the 13th International Conference on Parallel and Distributed Systems - Volume 01
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of resources that are geographically distributed across different locations. However, these resources are independent and managed separately by various organizations with different policies. This will have a major impact to users who submit their jobs to the Grid, as they have to deal with issues such as policy heterogeneity, security and fault tolerance. Moreover, the changes of Grid conditions, such as resources that may become unavailable for a period of time due to maintenance and/or suffer failures, would significantly affect the Quality of Service (QoS) requirements of users. Therefore, it is essential for users to take into account the effects of resource failures during jobs execution.In this paper, we present our work on introducing resource failures and failure detection into the GridSim simulation toolkit. As we need to conduct repeatable and controlled experiments, it is easier to use simulation as a means of studying complex scenarios. We also give a detailed description of the overall design and a use case scenario demonstrating the conditions of resources varied over time.