Performance instrumentation and visualization using AXE
Parallel computer systems
Finding bottlenecks in large-scale parallel programs
Finding bottlenecks in large-scale parallel programs
An object-based infrastructure for program monitoring and steering
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Fine-grained dynamic instrumentation of commodity operating system kernels
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
SPINE: a safe programmable and integrated network environment
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Future Generation Computer Systems - Special issue on metacomputing
Techniques for High-Performance Computational Steering
IEEE Concurrency
The Monitoring and Steering Environment
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Interactive Visual Exploration of Distributed Computations
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Autopilot: Adaptive Control of Distributed Applications
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Debugging Parallel Programs with Visual Patterns
VL '99 Proceedings of the IEEE Symposium on Visual Languages
A Network Co-processor-Based Approach to Scalable Media Streaming in Servers
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Grid Information Services for Distributed Resource Sharing
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
IMPuLSE: integrated monitoring and profiling for large-scale environments
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Isolation points: Creating performance-robust enterprise systems
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Hi-index | 0.00 |
In this paper we describe the dproc (distributed /proc) kernellevel mechanisms and abstractions, which provide the building blocks for implementation of efficient, cluster-wide, and application-specific performance monitoring. Such monitoring functionality may be constructed at any time, both before and during application invocation, and can include dynamic run-time extensions. This paper (i) presents dproc's implementation in a Linux-based cluster of SMP-machines, and (ii) evaluates its utility by construction of sample monitoring functionality. Full version of this paper can be found at: http://www.cc.gatech.edu/systems/projects/dproc/