Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Finding and preventing run-time error handling mistakes
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Autonomous recovery in componentized Internet applications
Cluster Computing
Exception-Chain Analysis: Revealing Exception Handling Architecture in Java Server Applications
ICSE '07 Proceedings of the 29th international conference on Software Engineering
FUSE: lightweight guaranteed distributed failure notification
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Exceptional situations and program reliability
ACM Transactions on Programming Languages and Systems (TOPLAS)
A self-stabilizing autonomic recoverer for eventual Byzantine software
Journal of Systems and Software
Error propagation analysis for file systems
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Exhaustive testing of exception handlers with enforcer
FMCO'06 Proceedings of the 5th international conference on Formal methods for components and objects
Deprogramming large software systems
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Empirical comparison of techniques for automated failure diagnosis
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Predicting failures of computer systems: a case study for a telecommunication system
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Handling cascading failures: the case for topology-aware fault-tolerance
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Root-cause analysis of performance anomalies in web-based applications
Proceedings of the 2011 ACM Symposium on Applied Computing
Architecting dependable systems with proactive fault management
Architecting dependable systems VII
Journal of Systems Architecture: the EUROMICRO Journal
Using link gradients to predict the impact of network latency on multitier applications
IEEE/ACM Transactions on Networking (TON)
Efficient Testing of Recovery Code Using Fault Injection
ACM Transactions on Computer Systems (TOCS)
A model of exception propagation in distributed applications
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Exception-Handling bugs in java and a language extension to avoid them
Advanced Topics in Exception Handling Techniques
Enforcer – efficient failure injection
FM'06 Proceedings of the 14th international conference on Formal Methods
Architectural design decisions for achieving reliable software systems
ISARCS'10 Proceedings of the First international conference on Architecting Critical Systems
Journal of Systems and Software
Hi-index | 0.00 |
Automatic Failure-Path Inference (AFPI) is anapplication-generic, automatic technique for dynamicallydiscovering the failure dependency graphs of componentizedInternet applications. AFPI's first phase is invasive,and relies on controlled fault injection to determine failurepropagation; this phase requires no a priori knowledgeof the application and takes on the order of hours torun. Once the system is deployed in production, the second,non-invasive phase of AFPI passively monitors thesystem, and updates the dependency graph as new failuresare observed. This process is a good match for theperpetually-evolving software found in Internet systems;since no performance overhead is introduced, AFPI isfeasible for live systems. We applied AFPI to J2EE andtested it by injecting Java exceptions into an e-commerceapplication and an online auction service. The resultinggraphs of exception propagation are more detailed andaccurate than what could be derived by time-consumingmanual inspection or analysis of readily-available staticapplication descriptions.