Line speed accurate superspreader identification using dynamic error compensation

Authors:
Guang Cheng;Yongning Tang
Affiliations:
-;-
Venue:
Computer Communications
Year:
2013

Citing 14
Cited 0

A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
Compressed bloom filters

IEEE/ACM Transactions on Networking (TON)
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Estimating flow distributions from sampled flow statistics

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Bitmap algorithms for counting active flows on high speed links

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Data streaming algorithms for efficient and accurate estimation of flow size distribution

Proceedings of the joint international conference on Measurement and modeling of computer systems
Building a better NetFlow

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Snort - Lightweight Intrusion Detection for Networks

LISA '99 Proceedings of the 13th USENIX conference on System administration
FlowScan: A Network Traffic Flow Reporting and Visualization Tool

LISA '00 Proceedings of the 14th USENIX conference on System administration
A robust system for accurate real-time summaries of internet traffic

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Joint data streaming and sampling techniques for detection of super sources and destinations

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Algorithms and estimators for accurate summarization of internet traffic

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement

Quantified Score

Hi-index	0.24

Visualization

Abstract

A superspreader is a source that connects with a large number of distinct destinations during a measuring interval. Identifying superspreaders provides very important information for many other applications, such as port scanner and Distributed Denial of Service (DDoS) attack detection, worm propagation measurement, and hot spots localization in peer-to-peer (p2p) and Content Delivery Networks (CDN). There are several challenges remaining in the area of superspreader identification, especially for high speed networks, exhibited by the trade-offs among three critical requirements, which are (1) fast data processing speed, (2) low memory space, and (3) high accuracy. In this paper, we propose a new Superspreader Identification System, which uses a bitmap to identify new network flows and a counting bloom filter to record flow numbers for each source. Several error compensation mechanisms are designed to dynamically compensate the system (e.g., hash collisions) while its states are changing, and thus substantially improve its accuracy in superspreaders identification. Multiple light-weight data operation techniques are also introduced to expedite data processing and reduce required memory. The Superspreader Identification System requires only 256KB SRAM to effectively identify superspreaders from high-speed data streams (e.g., OC-192) in line speed with more than 95% identification accuracy. Theoretical performance analysis is conducted to evaluate false negative ratio, false positive ratio, and compensation efficacy of the SIS. Comprehensive experiments using real Internet traces from a tier-1 Internet Service Provider (ISP) network demonstrate that the Superspreader Identification System is superior to other related approaches in terms of its achieved trade-offs among required memory space, identification accuracy, and data processing efficiency and capacity.