PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Balancing runtime and replay costs in a trace-and-replay system
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Optimally profiling and tracing programs
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Monitoring and debugging of distributed real-time systems
Monitoring and debugging of distributed real-time systems
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Optimal Tracing and Incremental Reexecution for Debugging Long-Running Programs
Optimal Tracing and Incremental Reexecution for Debugging Long-Running Programs
New Techniques for Replay Debugging
New Techniques for Replay Debugging
RDB: A System for Incremental Replay Debugging
RDB: A System for Incremental Replay Debugging
Software Reliability as a Function of User Execution Patterns
HICSS '99 Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences-Volume 8 - Volume 8
Usage patterns: extracting system functionality from observed profiles
Usage patterns: extracting system functionality from observed profiles
Conceptual framework for a software black box
Conceptual framework for a software black box
IEEE Transactions on Software Engineering
Model-based failure management for distributed reactive systems
Proceedings of the 13th Monterey conference on Composition of embedded systems: scientific and industrial issues
Hi-index | 0.00 |
Learning from software failures is an essential step towards the development of more reliable software systems and processes. However, as more intricate software systems are developed, determining the nature and causes of a software failure becomes a great challenge. In addition, although many existing techniques can help to understand the nature of the failure, they are limited in some of the following aspects. First, they work only within controlled environments. Second, they have a major impact on the target system behavior. Third, they assume that a failure can be reproduced. Fourth, they lack enough support to carry out a structured failure analysis. In this paper, we present the Software Black Box (SBB) as an alternative mechanism for failure investigation. The SBB is different from its predecessors in that it was specifically designed to be embedded in a target system and assist in the investigation of failures by reconstructing the events that lead to the failure. The SBB architecture is discussed, and a set of failure scenarios that reveal the SBB potential in assisting failure investigation is presented.