A relational approach to monitoring complex systems
ACM Transactions on Computer Systems (TOCS)
Software—Practice & Experience
Using high-speed WANs and network data caches to enable remote and distributed visualization
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Giggle: a framework for constructing scalable replica location services
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The NetLogger Methodology for High Performance Distributed Systems Performance Analysis
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Autopilot: Adaptive Control of Distributed Applications
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Dynamic Monitoring of High-Performance Distributed Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
MSS '01 Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies
Representing Dynamic Performance Information in Grid Environments with the Network Weather Service
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
The Kangaroo Approach to Data Movement on the Grid
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Enabling Network Measurement Portability Through a Hierarchy of Characteristics
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Web100: extended TCP instrumentation for research, education and diagnosis
ACM SIGCOMM Computer Communication Review
Functional architecture of performance measurement system based on grid monitoring architecture
MMM'07 Proceedings of the 13th International conference on Multimedia Modeling - Volume Part II
Hi-index | 0.00 |
Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. To determine the source of these performance problems, detailed end-to-end monitoring data from applications, networks, operating systems, and hardware must be correlated across time and space. Researchers need to be able to view and compare this very detailed monitoring data from a variety of angles. To address this problem, we propose a relational monitoring data archive that is designed to efficiently handle high-volume streams of monitoring data. In this paper we present an instrumentation and monitoring event archive service that can be used to collect and aggregate detailed end-to-end monitoring information from distributed applications. This archive service is designed to be scalable and fault tolerant. We also show how the archive is based on the "Grid Monitoring Architecture" defined by the Global Grid Forum.