Investigation of failure causes in workload-driven reliability testing

Authors:
Domenico Cotroneo;Roberto Pietrantuono;Leonardo Mariani;Fabrizio Pastore
Affiliations:
Università degli Studi di Napoli Federico II, Naples, Italy;Università degli Studi di Napoli Federico II, Naples, Italy;Università degli Studi di Milano Bicocca, Milano;Università degli Studi di Milano Bicocca, Milano
Venue:
Fourth international workshop on Software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting
Year:
2007

Citing 12
Cited 4

A probe effect in concurrent programs

Software—Practice & Experience
Debugging concurrent programs

ACM Computing Surveys (CSUR)
Java Virtual Machine Specification

Java Virtual Machine Specification
The Java Language Specification

The Java Language Specification
Testing using Log File Analysis: Tools, Methods, and Issues

ASE '98 Proceedings of the 13th IEEE international conference on Automated software engineering
Reflections on Industry Trends and Experimental Research in Dependability

IEEE Transactions on Dependable and Secure Computing
The Architecture of Virtual Machines

Computer
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Java Virtual Machine Monitoring for Dependability Benchmarking

ISORC '06 Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
Failure classification and analysis of the Java Virtual Machine

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Dynamic Detection of COTS Component Incompatibility

IEEE Software
A data mining approach to identify key factors in dependability experiments

EDCC'05 Proceedings of the 5th European conference on Dependable Computing

SIFT: a scalable iterative-unfolding technique for filtering execution traces

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Mining program workflow from interleaved traces

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining invariants from console logs for system problem detection

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Using entropy measures for comparison of software traces

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Virtual execution environments and middleware are required to be extremely reliable because applications running on top of them are developed assuming their correctness, and platform-level failures can result in serious and unexpected application-level problems. Since software platforms and middleware are often executed for long time without any interruption, large part of the testing process is devoted to investigate their behavior when long and stressful executions occur (these test cases are called workloads). When a problem is identified, software engineers examine log files to find its root cause. Unfortunately, since of the workloads length, log files can contain a huge amount of information and manual analysis is often prohibitive. Thus, de-facto, the identification of the root cause is mostly left to the intuition of the software engineer. In this paper, we propose a technique to automatically analyze logs obtained from workloads to retrieve important information that can relate the failure to its cause. The technique works in three steps: (1) during workload executions, the system under test is monitored; (2) logs extracted from workloads that have been successfully completed are used to derive compact and general models of the expected behavior of the target system; (3) logs corresponding to workloads terminated unsuccessfully are compared with the inferred models to identify anomalous event sequences. Anomalies help software engineers to identify failure causes. The technique can also be used during operational phase, to discover possible causes of unexpected failures by comparing logs corresponding to failing executions with models derived at testing time. Preliminary experimental results conducted on the Java Virtual Machine indicate that several bugs can be rapidly identified thanks to the feedbacks provided by our technique.