Line speed accurate superspreader identification using dynamic error compensation

  • Authors:
  • Guang Cheng;Yongning Tang

  • Affiliations:
  • -;-

  • Venue:
  • Computer Communications
  • Year:
  • 2013

Quantified Score

Hi-index 0.24

Visualization

Abstract

A superspreader is a source that connects with a large number of distinct destinations during a measuring interval. Identifying superspreaders provides very important information for many other applications, such as port scanner and Distributed Denial of Service (DDoS) attack detection, worm propagation measurement, and hot spots localization in peer-to-peer (p2p) and Content Delivery Networks (CDN). There are several challenges remaining in the area of superspreader identification, especially for high speed networks, exhibited by the trade-offs among three critical requirements, which are (1) fast data processing speed, (2) low memory space, and (3) high accuracy. In this paper, we propose a new Superspreader Identification System, which uses a bitmap to identify new network flows and a counting bloom filter to record flow numbers for each source. Several error compensation mechanisms are designed to dynamically compensate the system (e.g., hash collisions) while its states are changing, and thus substantially improve its accuracy in superspreaders identification. Multiple light-weight data operation techniques are also introduced to expedite data processing and reduce required memory. The Superspreader Identification System requires only 256KB SRAM to effectively identify superspreaders from high-speed data streams (e.g., OC-192) in line speed with more than 95% identification accuracy. Theoretical performance analysis is conducted to evaluate false negative ratio, false positive ratio, and compensation efficacy of the SIS. Comprehensive experiments using real Internet traces from a tier-1 Internet Service Provider (ISP) network demonstrate that the Superspreader Identification System is superior to other related approaches in terms of its achieved trade-offs among required memory space, identification accuracy, and data processing efficiency and capacity.