Scalable diagnosis in IP networks using path-based measurement and inference: A learning framework

Authors:
Rajesh Narasimha;Souvik Dihidar;Chuanyi Ji;Steven W. McLaughlin
Affiliations:
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, United States;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, United States;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, United States;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, United States
Venue:
Journal of Visual Communication and Image Representation
Year:
2010

Citing 35
Cited 1

Elements of information theory

Elements of information theory
Random early detection gateways for congestion avoidance

IEEE/ACM Transactions on Networking (TON)
TCP and explicit congestion notification

ACM SIGCOMM Computer Communication Review
Schemes for fault identification in communication networks

IEEE/ACM Transactions on Networking (TON)
End-to-end routing behavior in the Internet

IEEE/ACM Transactions on Networking (TON)
A coding approach to event correlation

Proceedings of the fourth international symposium on Integrated network management IV
Internet routing instability

IEEE/ACM Transactions on Networking (TON)
Learning in graphical models

Learning in graphical models
End-to-end internet packet dynamics

IEEE/ACM Transactions on Networking (TON)
Detecting shared congestion of flows via end-to-end measurement

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Approximation algorithms

Approximation algorithms
Resilient overlay networks

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Impact of link failures on VoIP performance

NOSSDAV '02 Proceedings of the 12th international workshop on Network and operating systems support for digital audio and video
Information Theory and Reliable Communication

Information Theory and Reliable Communication
BRITE: Universal Topology Generation from a User''s Perspective

BRITE: Universal Topology Generation from a User''s Perspective
Variational methods for inference and estimation in graphical models

Variational methods for inference and estimation in graphical models
An empirical evaluation of wide-area internet bottlenecks

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Edge-to-edge measurement-based distributed network monitoring

Computer Networks: The International Journal of Computer and Telecommunications Networking
Locating internet bottlenecks: algorithms, measurements, and implications

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
An algebraic approach to practical and scalable overlay network monitoring

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Probabilistic fault localization in communication systems using belief networks

IEEE/ACM Transactions on Networking (TON)
Empirical Study on Locating Congested Segments over the Internet Based on Multiple End-to-End Path Measurements

SAINT '05 Proceedings of the The 2005 Symposium on Applications and the Internet
A statistical framework for efficient monitoring of end-to-end network properties

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Shrink: a tool for failure diagnosis in IP networks

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Intelligent probing: a cost-effective approach to fault diagnosis in computer networks

IBM Systems Journal
Learning pattern classification-a survey

IEEE Transactions on Information Theory
Dynamic programming and the graphical representation of error-correcting codes

IEEE Transactions on Information Theory
The capacity of low-density parity-check codes under message-passing decoding

IEEE Transactions on Information Theory
Upper bounds on the rate of LDPC codes

IEEE Transactions on Information Theory
Universal coding, information, prediction, and estimation

IEEE Transactions on Information Theory
Minimum complexity density estimation

IEEE Transactions on Information Theory
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Measurement-based network monitoring and inference: scalability and missing information

IEEE Journal on Selected Areas in Communications
A factor graph approach to link loss monitoring in wireless sensor networks

IEEE Journal on Selected Areas in Communications
Adaptive diagnosis in distributed systems

IEEE Transactions on Neural Networks

A Two-Stage Approach for Network Monitoring

Journal of Network and Systems Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate scalability and performance of measurement-based network monitoring, focusing on failure and congestion diagnosis in IP networks for network-based multimedia applications. Path-based measurements using unicast probe-packets are obtained at end-hosts, and diagnosis is performed by exploiting the spatial dependence among those measurements. We formulate network monitoring in a machine learning framework using probabilistic graphical models which perform inference of the network states (on/off) using unicast measurements. We provide fundamental limits on the relationship between the number of probe packets, the size of a network and the ability to diagnose either failed links or congested network components. Specifically, the diagnosis problem is dealt in a two-fold manner. Initially for fault diagnosis, we construct a graphical model using a Bayesian belief network for path-based measurements. We then provide a lower bound on the average number of probes per edge for link failure diagnosis using variational inference under ''noisy'' probe measurements. Variational inference provides a feasible approximation to address the number of spatially dependent measurements needed for diagnosis in large networks. We then develop an entropy lower (EL) bound by drawing similarities between coding over a binary symmetric channel (BSC) and link failure diagnosis. Both bounds show that the number of measurements needed for diagnosis grows linearly with respect to the number of links. The analytical results are validated by simulation. On the other hand, for congestion diagnosis, we propose a solution based on decoding of linear error control codes on a BSC. In this scenario, we consider path-based probing experiments under both noiseless and ''noisy'' measurements and compare its performance against the fundamental limits. To identify the congested nodes we construct a factor graph, and congestion is inferred using belief-propagation algorithm. Simulation results demonstrate the ability of our approach to perfectly localize congested nodes using a scalable number of measurements and a computationally efficient algorithm. We believe that this study can ease the problem arising due to lack of QoS support and provide good-quality broadband multimedia services.