Modeling and Verification of Time Dependent Systems Using Time Petri Nets
IEEE Transactions on Software Engineering
Fault detection with multiple observers
IEEE/ACM Transactions on Networking (TON)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Observer-A Concept for Formal On-Line Validation of Distributed Systems
IEEE Transactions on Software Engineering
Schemes for fault identification in communication networks
IEEE/ACM Transactions on Networking (TON)
A unified approach to fault-tolerance in communication protocols based on recovery procedures
IEEE/ACM Transactions on Networking (TON)
Automated packet trace analysis of TCP implementations
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Specification and verification of fault-tolerance, timing, and scheduling
ACM Transactions on Programming Languages and Systems (TOPLAS)
What packets may come: automata for network monitoring
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial-Order Reduction in Symbolic State-Space Exploration
Formal Methods in System Design - Special issue on CAV '97
Symbolic Model Checking
Detection of Summative Global Predicates
ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
From Crash Fault-Tolerance to Arbitrary-Fault Tolerance: Towards a Modular Approach
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
On the Quality of Service of Failure Detectors
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
A Framework for Database Audit and Control Flow Checking for a Wireless Telephone Network Controller
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Compositional Approach to Monitoring Distributed Systems
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, School/Workshop
ACM Transactions on Computer Systems (TOCS)
Automatic alarm correlation for fault identification
INFOCOM '95 Proceedings of the Fourteenth Annual Joint Conference of the IEEE Computer and Communication Societies (Vol. 2)-Volume - Volume 2
Deadlock Detection in Communicating Finite State Machines by Even Reachability Analysis
ICCCN '95 Proceedings of the 4th International Conference on Computer Communications and Networks
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
TRAM: A Tree-based Reliable Multicast Protocol
TRAM: A Tree-based Reliable Multicast Protocol
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Self Checking Network Protocols: A Monitor Based Approach
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
How to keep your head above water while detecting errors
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
How to keep your head above water while detecting errors
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Error detection framework for complex software systems
EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Constructing formal rules to verify message communication in distributed systems
The Journal of Supercomputing
A proposal to detect errors in Enterprise Application Integration solutions
Journal of Systems and Software
A decentralized approach for mining event correlations in distributed system monitoring
Journal of Parallel and Distributed Computing
Specification and verification of reliability in dispatching multicast messages
The Journal of Supercomputing
Hi-index | 0.00 |
It is a challenge to provide detection facilities for large-scale distributed systems running legacy code on hosts that may not allow fault tolerant functions to execute on them. It is tempting to structure the detection in an observer system that is kept separate from the observed system of protocol entities, with the former only having access to the latter's external message exchanges. In this paper, we propose an autonomous self-checking Monitor system, which is used to provide fast detection to underlying network protocols. The Monitor architecture is application neutral and, therefore, lends itself to deployment for different protocols, with the rulebase against which the observed interactions are matched, making it specific to a protocol. To make the detection infrastructure scalable and dependable, we extend it to a hierarchical Monitor structure. The Monitor structure is made dynamic and reconfigurable by designing different interactions to cope with failures, load changes, or mobility. The latency of the Monitor system is evaluated under fault free conditions, while its coverage is evaluated under simulated error injections.