Orthogonal Defect Classification-A Concept for In-Process Measurements
IEEE Transactions on Software Engineering - Special issue on software measurement principles, techniques, and environments
Dependability Measurement and Modeling of a Multicomputer System
IEEE Transactions on Computers
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A Statistical Failure/Load Relationship: Results of a Multicomputer Study
IEEE Transactions on Computers
Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Measurement-based Analysis of Networked System Availability
Performance Evaluation: Origins and Directions
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Software-Based Failure Detection and Recovery in Programmable Network Interfaces
IEEE Transactions on Parallel and Distributed Systems
Proactive management of software aging
IBM Journal of Research and Development
Error detection framework for complex software systems
EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Hi-index | 0.02 |
This paper describes Analyze-NOW an environment for collection and analysis of failures/errors in a network of workstations. Descriptions cover the data collection methodology and the tool implemented to facilitate this process. Software tools used for analysis are described, with emphasis on the details of the implementation of the Analyzer, the primary analysis tool. Application of the tools is demonstrated by using them to collect and analyze failure data (for 32 week period) from a network of 69 SunOS-based workstations. Classification based on the source and the effect of faults is used to identify problem areas. Different types of failures encountered on the machines and the network are highlighted to develop a proper understanding of failures in a network environment. Lastly, a case is made for using the results from the analysis tool to pinpoint the problem areas in the network.