Towards ideal network traffic measurement: a statistical algorithmic approach

  • Authors:
  • Jun (Jim) Xu;Qi Zhao

  • Affiliations:
  • Georgia Institute of Technology;Georgia Institute of Technology

  • Venue:
  • Towards ideal network traffic measurement: a statistical algorithmic approach
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the emergence of computer networks as one of the primary platforms of communication, and with their adoption for an increasingly broad range of applications, there is a growing need for high-quality network traffic measurements to better understand, characterize and engineer the network behaviors. Due to the inherent lack of fine-grained measurement capabilities in the original design of the Internet, it does not have enough data or information to compute or even approximate some traffic statistics such as traffic matrices and per-link delay. While it is possible to infer these statistics from indirect aggregate measurements that are widely supported by network measurement devices (e.g., routers), how to obtain the best possible inferences is often a challenging research problem. We name this as "too little data" problem after its root cause. Interestingly, while "too little data" is clearly a problem, "too much data" is not a blessing either. The main contribution of this dissertation is to design new software and hardware technologies to address these challenges. We propose a novel statistical algorithmic solution, which consists of the following three complementary methodologies. First, network data inference with multiple data sources are proposed to counter "too little data" problem. We find that in order to meet the ever-increasing demand on traffic monitoring large ISPs are deploying new traffic measurement capabilities onto their IP backbones. For example, in AT&T backbone, besides Simple Network Management Protocol (SNMP) which has run in the whole network for a long time, Cisco sampled NetFlow also started to be instrumented at ingress/egress edge routers of the network in recent years. Each of them generates a separate measurement data set which provides a complementary yet orthogonal observation for the complete picture of the statistics of interest. Our methodology is to investigate these data sets altogether to offer better inference accuracy. Another important advantage of this methodology is to identify accidental errors occurring in measurement results (called "dirty data"). We find that although different measurement capabilities work independently they may collect some common information in the full information spectrum. This information redundancy could be compared and verified between multiple measurement results and then the errors could be identified and removed. Using this methodology we design a set of methods for robust traffic matrix estimation and detection of dirty data by correlating both link-level and path-level information. Second, network data streaming has been recognized as a new approach to alleviate "too much data" problem in network research community. It is concerned with processing a long data stream (e.g., network traffic) in a single pass using a small working memory to answer a class of queries regarding the stream. This methodology usually requires some extra hardware such as SRAM chips to support high-speed network links (e.g., 40Gbps) but is able to provide more accurate results typically. In this dissertation we design data streaming algorithms for estimation of several important traffic statistics that have traditionally been considered hard to be measured at high-speed network links and routers. We also devise some novel methods to correlate data streaming and traditional sampling techniques for measuring network traffic, which link this network data streaming methodology with the aforementioned methodology which combines multiple data sources together. In addition, these works lead to some notable mathematical results and methods such as a new large deviation theorem that finds applications in various areas. Enhancing storage and processing hardware in a measurement system is another way to alleviate the challenges. In this dissertation, we focus on a specific fundamental tool of measurement: counting. In particular, we explore the problem—how to maintain a large counter array efficiently in high speed?. We can see that many network measurement applications need maintain a large number of counters incremented by a high-speed massive data stream (e.g., network traffic) in order to record all kinds of information. As line rates get ever-increasing, each counter matched by a packet must be read, incremented and written in a tiny cycle (e.g., a few nanoseconds). Thus we have to use fast memory (SRAM) to hold these counters. However, a router potentially need support millions of real-time counters and wide counters (e.g., 64 bits) are required in order to avoid overflow quickly. Therefore, placing all counters in SRAM requires potentially infeasible amounts of fast memory. We design a optimal hybrid SRAM/DRAM statistics counter architecture to counter this problem. We also derive a tight statistical bound for the performance of this architecture, which adopts the large deviation theorem developed in our research on network data streaming. (Abstract shortened by UMI.)