A network performance tool for grid environments
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Future Generation Computer Systems - Special issue on metacomputing
JEWEL: Design and Implementation of a Distributed Measurement System
IEEE Transactions on Parallel and Distributed Systems
A Framework for Adaptive Storage Input/Output on Computational Grids
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
SPI: an instrumentation development environment for parallel/distributed systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Directory Service for Configuring High-Performance Distributed Computations
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
The NetLogger Methodology for High Performance Distributed Systems Performance Analysis
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
A Fault Detection Service for Wide Area Distributed Computations
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
An Evaluation of Linear Models for Host Load Prediction
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
An interactive interface and RT-Mach support for monitoring and controlling resource management
RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
CPU Service Classes for Multimedia Applications
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
An Infrastructure for Grid Application Monitoring
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Grid Network Monitoring in the European Datagrid Project
International Journal of High Performance Computing Applications
A resource management and fault tolerance services in grid computing
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Performance evaluation of 3-hierarchical resource management model with grid service architecture
UIC'07 Proceedings of the 4th international conference on Ubiquitous Intelligence and Computing
Future Generation Computer Systems
Hi-index | 0.00 |
We present the design and implementation of an infrastructure that enables monitoring of resources, services, and applications in a computational grid and provides a toolkit to help manage these entities when faults occur. This infrastructure builds on three basic monitoring components: sensors to perform measurements, actuators to perform actions, and an event service to communicate events between remote processes. We describe how we apply our infrastructure to support a grid service and an application: (1) the Globus Metacomputing Directory Service; and (2) a long-running and coarse-grained parameter study application. We use these application to show that our monitoring infrastructure is highly modular, conveniently retargettable, and extensible.