Extensible, Scalable Monitoring for Clusters of Computers

Authors:
Eric Anderson;Dave Patterson
Affiliations:
U. C. Berkeley;U. C. Berkeley
Venue:
LISA '97 Proceedings of the 11th USENIX conference on System administration
Year:
1997

Citing 12
Cited 4

Physiological principles for the effective use of color

IEEE Computer Graphics and Applications
Multicast routing in datagram internetworks and extended LANs

ACM Transactions on Computer Systems (TOCS)
Granularity of locks and degrees of consistency in a shared data base

Readings in database systems (2nd ed.)
Improved algorithms for synchronizing computer network clocks

IEEE/ACM Transactions on Networking (TON)
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Balancing push and pull for data broadcast

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A Case for NOW (Networks of Workstations)

IEEE Micro
Automated System Monitoring and Notification With Swatch

LISA '93 Proceedings of the 7th USENIX conference on System administration
A Practical Approach to NFS Response Time Monitoring

LISA '93 Proceedings of the 7th USENIX conference on System administration
LACHESIS: A Tool for Benchmarking Internet Service Providers

LISA '95 Proceedings of the 9th USENIX conference on System administration
Tracking Hardware Configurations in a Heterogeneous Network with syslogd

LISA '95 Proceedings of the 9th USENIX conference on System administration
OC3MON: Flexible, Affordable, High Performance Staistics Collection

LISA '96 Proceedings of the 10th USENIX conference on System administration

Implementing a Generalized Tool for Network Monitoring: ("Best Paper" Award!)

LISA '97 Proceedings of the 11th USENIX conference on System administration
Understanding cross-tier delay of multi-tier application using selective invocation context extraction

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Panopticon: a scalable monitoring system

SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
A flexible architecture integrating monitoring and analytics for managing large-scale data centers

Proceedings of the 8th ACM international conference on Autonomic computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the CARD (Cluster Administration using Relational Databases) system for monitoring large clusters of cooperating computers. CARD scales both in capacity and in visualization to at least 150 machines, and can in principle scale far beyond that. The architecture is easily extensible to monitor new cluster software and hardware. CARD detects and automatically recovers from common faults. CARD uses a Java applet as its primary interface allowing users anywhere in the world to monitor the cluster through their browser.