Segment sizes and liftetimes in Algol 60 programs
Communications of the ACM
Ethernet: distributed packet switching for local computer networks
Communications of the ACM
A large semaphore based operating system
Communications of the ACM
Improving locality by critical working sets
Communications of the ACM
Operating Systems Theory
Some constraints and tradeoffs in the design of network communications
SOSP '75 Proceedings of the fifth ACM symposium on Operating systems principles
An operational system for computer resource sharing
SOSP '75 Proceedings of the fifth ACM symposium on Operating systems principles
Copilot: a multiple-process approach to interactive programming systems.
Copilot: a multiple-process approach to interactive programming systems.
Handling Timing Errors in Distributed Programs
IEEE Transactions on Software Engineering
A real-time monitor for a distributed real-time operating system
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
A bibliography of parallel debuggers, 1990 edition
ACM SIGPLAN Notices
Communications of the ACM
Debugging heterogeneous distributed systems using event-based models of behavior
ACM Transactions on Computer Systems (TOCS)
File placement and process assignment due to resource sharing in a distributed system
WSC '85 Proceedings of the 17th conference on Winter simulation
An experimental distributed modeling system
ACM Transactions on Information Systems (TOIS)
DPM: A Measurement System for Distributed Programs
IEEE Transactions on Computers
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Performance evaluation of communicating processes
SIGMETRICS '79 Proceedings of the 1979 ACM SIGMETRICS conference on Simulation, measurement and modeling of computer systems
XRAY: Instrumentation for multiple computers
PERFORMANCE '80 Proceedings of the 1980 international symposium on Computer performance modelling, measurement and evaluation
WFS a simple shared file system for a distributed environment
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Performance-reliability issues in distributed file systems
Journal of Systems and Software
Hi-index | 0.02 |
Metric is a distributed software measurement system that communicates measurement data over the PARC computer network, the Ethernet. Metric is used to instrument stand alone and distributed computer systems (it works in an environment of about 90 machines total and is used by about 15 machines). The system is divided into three parts: object system probes that transmit measurement events, the accountant that receives and stores those events, and the analyst that manipulates the data for the user. Measurement events, small packets of standardly formatted measurement data, are used in a way that emphasizes their independence, history and context in a running system. Events are not counts of some system activity, they are a mini-snapshot of the state of the system when some activity begins or ends. In this way they provide context about what is happening in the system, and a succession of events provides a rich history of what has occurred in the system under study. The contextual information intrinsic to an event supports its independence—the event carries with it the information necessary to describe what it is all about. Metric's robustness is a direct consequence of its simplicity, its simple communications protocols and the independence of its parts prevent failures in the Metric system from interfering with the user's object system. Most failures in the object system are unlikely to interfere with the functioning of the Metric system. The standard format of events enables the accountant to receive events from different environments in a straightforward fashion, and makes the job of data handling easier for the analyst. Another advantage of Metric's simplicity is its economy of use: object system probes use about 100 microseconds to transmit data to the analyst. Object systems that use Metric continuously transmit event data. This means the event history log maintained by the accountant can be examined after particularly mysterious crashes to determine what the system had been doing lately. The tripartite division of the analyst into the kernel, utility layer and applications layer simplifies the job of maintenance, use, and extension of the system. The kernel understands event format and acts in behalf of applications to examine data collected by the accountant. The utility layer understands global system structures and language constructs to simplify the job of data analysis and presentation. The application layer is specific code written to answer some particular questions about a system. It is usually quite small and simple. In summary, Metric is unusual because of the way it exploits the Ethernet, its insistence on standardized measurement information, its efforts to make information intelligible to its users, and its extensibility in the face of very different user environments. The isolation of Metric's parts into different machines that communicate over the Ethernet has proven to be a very effective way of achieving a remarkably robust, low cost measurement tool. Metric's emphasis upon the context and history associated with measurements facilitates the use of measurement data.