Software error early detection system based on run-time statistical analysis of function return values

Authors:
Alex Depoutovitch;Michael Stumm
Affiliations:
Dept. of Computer Science and Dept. of Electrical and Computer Engineering, University of Toronto;Dept. of Computer Science and Dept. of Electrical and Computer Engineering, University of Toronto
Venue:
HotACI'06 Proceedings of the First international conference on Hot topics in autonomic computing
Year:
2006

Citing 8
Cited 0

Tracking down software bugs using automatic anomaly detection

Proceedings of the 24th International Conference on Software Engineering
The Future of Systems Research

Computer
Proactive Detection of Software Aging Mechanisms in Performance Critical Computers

SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Multi-resolution Abnormal Trace Detection Using Varied-length N-grams and Automata

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel

IBM Systems Journal
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Three research challenges at the intersection of machine learning, statistical induction, and systems

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Detecting application-level failures in component-based Internet services

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large software systems are extremely complex and based on code that is constantly changing with bug fixes and new features. As a result, these systems will likely never be free of bugs. The bugs typically don't expose themselves until they are triggered by a new workload, and when triggered, they are rarely immediately fatal, but result in a system that continues to run with corrupt internal state, deteriorating over time to the point where it becomes inoperable. Having a method to identify corrupt state early would allow the initiation of defensive actions such as flushing page caches or redirecting external requests to another service in the cluster. In this paper, we propose a statistical method of detecting problems in software at run-time based on analyzing function return values. The methodology, at this time, requires the availability of source code, but does not require understanding the source code. Our experimental results indicate that our method can be effective in identifying problems early on, potentially allowing for defensive measures. The overhead is negligible at less than 1%.