dproc - Extensible Run-Time Resource Monitoring for Cluster Applications

Authors:
Jasmina Jancic;Christian Poellabauer;Karsten Schwan;Mathhew Wolf;Neil Bright
Affiliations:
-;-;-;-;-
Venue:
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Year:
2002

Citing 14
Cited 2

Performance instrumentation and visualization using AXE

Parallel computer systems
Finding bottlenecks in large-scale parallel programs

Finding bottlenecks in large-scale parallel programs
An object-based infrastructure for program monitoring and steering

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Fine-grained dynamic instrumentation of commodity operating system kernels

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
SPINE: a safe programmable and integrated network environment

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Techniques for High-Performance Computational Steering

IEEE Concurrency
Distance Visualization: Data Exploration on the Grid

Computer
The Monitoring and Steering Environment

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Interactive Visual Exploration of Distributed Computations

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Autopilot: Adaptive Control of Distributed Applications

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Debugging Parallel Programs with Visual Patterns

VL '99 Proceedings of the IEEE Symposium on Visual Languages
A Network Co-processor-Based Approach to Scalable Media Streaming in Servers

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Grid Information Services for Distributed Resource Sharing

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing

IMPuLSE: integrated monitoring and profiling for large-scale environments

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Isolation points: Creating performance-robust enterprise systems

ACM Transactions on Autonomous and Adaptive Systems (TAAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe the dproc (distributed /proc) kernellevel mechanisms and abstractions, which provide the building blocks for implementation of efficient, cluster-wide, and application-specific performance monitoring. Such monitoring functionality may be constructed at any time, both before and during application invocation, and can include dynamic run-time extensions. This paper (i) presents dproc's implementation in a Linux-based cluster of SMP-machines, and (ii) evaluates its utility by construction of sample monitoring functionality. Full version of this paper can be found at: http://www.cc.gatech.edu/systems/projects/dproc/