Rx: treating bugs as allergies---a safe method to survive software failures
Proceedings of the twentieth ACM symposium on Operating systems principles
Have things changed now?: an empirical study of bug characteristics in modern open source software
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
ACM Transactions on Computer Systems (TOCS)
On modeling and tolerating incorrect software
Journal of High Speed Networks - Self-Stabilizing Systems, Part 2
Correlating multi-session attacks via replay
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Treating bugs as allergies: a safe method for surviving software failures
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Rx: Treating bugs as allergies—a safe method to survive software failures
ACM Transactions on Computer Systems (TOCS)
Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers
IEEE Transactions on Dependable and Secure Computing
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Uncertainty explicit assessment of off-the-shelf software: A Bayesian approach
Information and Software Technology
Using Inherent Service Redundancy and Diversity to Ensure Web Services Dependability
Methods, Models and Tools for Fault Tolerance
An empirical study of reported bugs in server software with implications for automated bug diagnosis
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Correlating multi-session attacks via replay
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
How do programs become more concurrent: a story of program transformations
Proceedings of the 4th International Workshop on Multicore Software Engineering
Locating failure-inducing environment changes
Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools
Dependable composite web services with components upgraded online
Architecting Dependable Systems III
F(I)MEA-technique of web services analysis and dependability ensuring
Rigorous Development of Complex Fault-Tolerant Systems
How does testing affect the availability of aging software systems?
Performance Evaluation
A characteristic study on failures of production distributed data-parallel programs
Proceedings of the 2013 International Conference on Software Engineering
Discovering, reporting, and fixing performance bugs
Proceedings of the 10th Working Conference on Mining Software Repositories
Hi-index | 0.00 |
This paper tests the hypothesis that generic recovery techniques, such as process pairs, can survive most application faults without using application-specific information. We examine in detail the faults that occur in three, large, open-source applications: the Apache web server, the GNOME desktop environment, and the MySQL database. Using information contained in the bug reports and source code, we classify faults based on how they depend on the operating environment. We find that 72-87% of the faults are independent of the operating environment and are hence deterministic (non-transient). Recovering from the failures caused by these faults requires the use of application-specific knowledge. Half of the remaining faults depend on a condition in the operating environment that is likely to persist on retry, and the failures caused by these faults are likely to require application-specific recovery. Unfortunately, only 5-14% of the faults were triggered by transient conditions, such as timing and synchronization that naturally fix them during recovery. Our results indicate that classical application-generic recovery techniques, such as process pairs, will not be sufficient to enable applications to survive most failures caused by application faults.