An Infrastructure for Monitoring and Management in Computational Grids

  • Authors:
  • Abdul Waheed;Warren Smith;Jude George;Jerry C. Yan

  • Affiliations:
  • -;-;-;-

  • Venue:
  • LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the design and implementation of an infrastructure that enables monitoring of resources, services, and applications in a computational grid and provides a toolkit to help manage these entities when faults occur. This infrastructure builds on three basic monitoring components: sensors to perform measurements, actuators to perform actions, and an event service to communicate events between remote processes. We describe how we apply our infrastructure to support a grid service and an application: (1) the Globus Metacomputing Directory Service; and (2) a long-running and coarse-grained parameter study application. We use these application to show that our monitoring infrastructure is highly modular, conveniently retargettable, and extensible.