Flow sampling under hard resource constraints

Authors:
Nick Duffield;Carsten Lund;Mikkel Thorup
Affiliations:
AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ
Venue:
Proceedings of the joint international conference on Measurement and modeling of computer systems
Year:
2004

Citing 13
Cited 40

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Efficient policies for carrying Web traffic over flow-switched networks

IEEE/ACM Transactions on Networking (TON)
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Deriving traffic demands for operational IP networks: methodology and experience

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Trajectory sampling for direct traffic observation

IEEE/ACM Transactions on Networking (TON)
New directions in traffic measurement and accounting

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Charging from sampled network usage

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Adaptive random sampling for load change detection

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Introduction to Algorithms

Introduction to Algorithms
Equivalence between Priority Queues and Sorting

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Estimating flow distributions from sampled flow statistics

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Inverting sampled traffic

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement

Reversible sketches for efficient and accurate change detection over network data streams

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Estimating arbitrary subset sums with few probes

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Confidence intervals for priority sampling

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Sampling time-dependent parameters in high-speed network monitoring

Proceedings of the ACM international workshop on Performance monitoring, measurement, and evaluation of heterogeneous wireless and wired networks
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Identifying and discriminating between web and peer-to-peer traffic in the network core

Proceedings of the 16th international conference on World Wide Web
Optimal combination of sampled network measurements

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Sketching unaggregated data streams for subpopulation-size queries

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Summarizing data using bottom-k sketches

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
ProgME: towards programmable network measurement

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Offline/realtime traffic classification using semi-supervised learning

Performance Evaluation
Algorithms and estimators for accurate summarization of internet traffic

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Priority sampling for estimation of arbitrary subset sums

Journal of the ACM (JACM)
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters

ACM SIGCOMM Computer Communication Review
Information Assurance: Dependability and Security in Networked Systems

Information Assurance: Dependability and Security in Networked Systems
A generic language for application-specific flow sampling

ACM SIGCOMM Computer Communication Review
Load shedding in network monitoring applications

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Confident estimation for multistage measurement sampling and aggregation

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal sampling in state space models with applications to network monitoring

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A stratified traffic sampling methodology for seeing the big picture

Computer Networks: The International Journal of Computer and Telecommunications Networking
A programmable architecture for scalable and real-time network traffic measurements

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Optimal sampling from sliding windows

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
TCP portscan detection based on single packet flows and entropy

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Estimating flow distribution by using difference information of multiple packet samplings

ICOIN'09 Proceedings of the 23rd international conference on Information Networking
On-line predictive load shedding for network monitoring

NETWORKING'07 Proceedings of the 6th international IFIP-TC6 conference on Ad Hoc and sensor networks, wireless networks, next generation internet
On the variance of subset sum estimation

ESA'07 Proceedings of the 15th annual European conference on Algorithms
HiFIND: A high-speed flow-level intrusion detection approach with DoS resiliency

Computer Networks: The International Journal of Computer and Telecommunications Networking
Fast Filtered Sampling

Computer Networks: The International Journal of Computer and Telecommunications Networking
Network prefix-level traffic profiling: Characterizing, modeling, and evaluation

Computer Networks: The International Journal of Computer and Telecommunications Networking
Dynamic feature analysis and measurement for large-scale network traffic monitoring

IEEE Transactions on Information Forensics and Security
ProgME: towards programmable network measurement

IEEE/ACM Transactions on Networking (TON)
Monitoring abnormal network traffic based on blind source separation approach

Journal of Network and Computer Applications
Mining approximate frequent closed flows over packet streams

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Towards a universal sketch for origin-destination network measurements

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Optimal sampling from sliding windows

Journal of Computer and System Sciences
Easily-Implemented adaptive packet sampling for high speed networks flow measurement

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Increasing data center network visibility with cisco NetFlow-Lite

Proceedings of the 7th International Conference on Network and Services Management
Efficient packet sampling for accurate traffic measurements

Computer Networks: The International Journal of Computer and Telecommunications Networking
Per-flow traffic measurement through randomized counter sharing

IEEE/ACM Transactions on Networking (TON)
Modeling residual-geometric flow sampling

IEEE/ACM Transactions on Networking (TON)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many network management applications use as their data traffic volumes differentiated by attributes such as IP address or port number. IP flow records are commonly collected for this purpose: these enable determination of fine-grained usage of network resources. However, the increasingly large volumes of flow statistics incur concomitant costs in the resources of the measurement infrastructure. This motivates sampling of flow records.This paper addresses sampling strategy for flow records. Recent work has shown that non-uniform sampling is necessary in order to control estimation variance arising from the observed heavy-tailed distribution of flow lengths. However, while this approach controls estimator variance, it does not place hard limits on the number of flows sampled. Such limits are often required during arbitrary downstream sampling, resampling and aggregation operations employed in analysis of the data.This paper proposes a correlated sampling strategy that is able to select an arbitrarily small number of the "best" representatives of a set of flows. We show that usage estimates arising from such selection are unbiased, and show how to estimate their variance, both offline for modeling purposes, and online during the sampling itself. The selection algorithm can be implemented in a queue-like data structure in which memory usage is uniformly bounded during measurement. Finally, we compare the complexity and performance of our scheme with other potential approaches.