Monitoring data archives for grid environments

Authors:
Jason Lee;Dan Gunter;Martin Stoufer;Brian Tierney
Affiliations:
Lawrence Berkeley National Laboratory;Lawrence Berkeley National Laboratory;Lawrence Berkeley National Laboratory;Lawrence Berkeley National Laboratory
Venue:
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Year:
2002

Citing 11
Cited 4

A relational approach to monitoring complex systems

ACM Transactions on Computer Systems (TOCS)
Performance measurement, visualization and modeling of parallel and distributed programs using the AIMS toolkit

Software—Practice & Experience
Using high-speed WANs and network data caches to enable remote and distributed visualization

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The NetLogger Methodology for High Performance Distributed Systems Performance Analysis

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Autopilot: Adaptive Control of Distributed Applications

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Dynamic Monitoring of High-Performance Distributed Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing

MSS '01 Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies
Representing Dynamic Performance Information in Grid Environments with the Network Weather Service

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
The Kangaroo Approach to Data Movement on the Grid

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing

A TCP tuning daemon

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Enabling Network Measurement Portability Through a Hierarchy of Characteristics

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Web100: extended TCP instrumentation for research, education and diagnosis

ACM SIGCOMM Computer Communication Review
Functional architecture of performance measurement system based on grid monitoring architecture

MMM'07 Proceedings of the 13th International conference on Multimedia Modeling - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. To determine the source of these performance problems, detailed end-to-end monitoring data from applications, networks, operating systems, and hardware must be correlated across time and space. Researchers need to be able to view and compare this very detailed monitoring data from a variety of angles. To address this problem, we propose a relational monitoring data archive that is designed to efficiently handle high-volume streams of monitoring data. In this paper we present an instrumentation and monitoring event archive service that can be used to collect and aggregate detailed end-to-end monitoring information from distributed applications. This archive service is designed to be scalable and fault tolerant. We also show how the archive is based on the "Grid Monitoring Architecture" defined by the Global Grid Forum.