A distributed programs monitor for Berkeley UNIX
Software—Practice & Experience
Monitoring distributed systems
ACM Transactions on Computer Systems (TOCS)
Key Concepts of the INCAS Multicomputer Project
IEEE Transactions on Software Engineering
Monit: a performance monitoring tool for parallel and pseudo-parallel programs
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Global events and global breakpoints in distributed systems
Proceedings of the Twenty-First Annual Hawaii International Conference on Software Track
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
The instrumentation of multics
Communications of the ACM
The Thoth System
Experience with a New Distributed Termination Detection Algorithm
Proceedings of the 2nd International Workshop on Distributed Algorithms
Online system performance measurements with software and hybrid monitors
SOSP '73 Proceedings of the fourth ACM symposium on Operating system principles
Performance Characterization of Distributed Programs
Performance Characterization of Distributed Programs
The Traveling Salesman Problem: The Development of a Distributed
The Traveling Salesman Problem: The Development of a Distributed
Debugging techniques for communicating, loosely-coupled processes
Debugging techniques for communicating, loosely-coupled processes
Performance monitoring in computer systems: a structured approach
ACM SIGOPS Operating Systems Review
Computer
Bibliography on network management
ACM SIGCOMM Computer Communication Review
A Hybrid Monitor for Behavior and Performance Analysis of Distributed Systems
IEEE Transactions on Software Engineering
Dynamic control of performance monitoring on large scale parallel systems
ICS '93 Proceedings of the 7th international conference on Supercomputing
Some experiences with distributed systems in the INCAS project
EW 3 Proceedings of the 3rd workshop on ACM SIGOPS European workshop: Autonomy or interdependence in distributed systems?
Objective-Driven Monitoring for Broadband Networks
IEEE Transactions on Knowledge and Data Engineering
JEWEL: Design and Implementation of a Distributed Measurement System
IEEE Transactions on Parallel and Distributed Systems
HERCULE: Non-invasively Tracking JavaTM Component-Based Application Activity
ECOOP '00 Proceedings of the 14th European Conference on Object-Oriented Programming
Detection of Response Time Failures of Real-Time Software
ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Considering diagnosis functionality during automatic system-level design of automotive networks
Proceedings of the 49th Annual Design Automation Conference
Hi-index | 0.00 |
This paper describes an integrated tool for monitoring distributed systems continuously during operation. A hybrid monitoring approach is used. As special hardware support a test and measurement processor (TMP) was designed, which is part of each node in an experimental multicomputer system. Each TMP runs local parts of the monitoring software for its node, while all the TMPs are connected to a central test station via a separate TMP interconnection network. The monitoring system is transparent to users. It permanently observes system behavior, measures system performance and records system information. The immense amount of information is graphically displayed in easy-to-read-charts and graphs in an application-oriented manner. The tools promote an improved understanding of run time behavior and performance measurements to derive qualitative and even quantitative assessments about distributed systems. A prototype of the monitoring facility is operational and currently experiments are being conducted in our distributed system consisting of several MC68000 microcomputers.