An Infrastructure for Monitoring and Management in Computational Grids

Authors:
Abdul Waheed;Warren Smith;Jude George;Jerry C. Yan
Affiliations:
-;-;-;-
Venue:
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Year:
2000

Citing 12
Cited 5

A network performance tool for grid environments

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
The Paradyn Parallel Performance Measurement Tool

Computer
JEWEL: Design and Implementation of a Distributed Measurement System

IEEE Transactions on Parallel and Distributed Systems
A Framework for Adaptive Storage Input/Output on Computational Grids

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
SPI: an instrumentation development environment for parallel/distributed systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Directory Service for Configuring High-Performance Distributed Computations

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
The NetLogger Methodology for High Performance Distributed Systems Performance Analysis

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
An Evaluation of Linear Models for Host Load Prediction

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
An interactive interface and RT-Mach support for monitoring and controlling resource management

RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
CPU Service Classes for Multimedia Applications

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

An Infrastructure for Grid Application Monitoring

Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Grid Network Monitoring in the European Datagrid Project

International Journal of High Performance Computing Applications
A resource management and fault tolerance services in grid computing

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Performance evaluation of 3-hierarchical resource management model with grid service architecture

UIC'07 Proceedings of the 4th international conference on Ubiquitous Intelligence and Computing
A novel service-oriented intelligent seamless migration algorithm and application for pervasive computing environments

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the design and implementation of an infrastructure that enables monitoring of resources, services, and applications in a computational grid and provides a toolkit to help manage these entities when faults occur. This infrastructure builds on three basic monitoring components: sensors to perform measurements, actuators to perform actions, and an event service to communicate events between remote processes. We describe how we apply our infrastructure to support a grid service and an application: (1) the Globus Metacomputing Directory Service; and (2) a long-running and coarse-grained parameter study application. We use these application to show that our monitoring infrastructure is highly modular, conveniently retargettable, and extensible.