Confident estimation for multistage measurement sampling and aggregation

Authors:
Edith Cohen;Nick Duffield;Carsten Lund;Mikkel Thorup
Affiliations:
AT&T Labs-Research, Florham Park, NJ, USA;AT&T Labs-Research, Florham Park, NJ, USA;AT&T Labs-Research, Florham Park, NJ, USA;AT&T Labs-Research, Florham Park, NJ, USA
Venue:
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
2008

Citing 21
Cited 3

Application of sampling methodologies to network traffic characterization

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Trajectory sampling for direct traffic observation

IEEE/ACM Transactions on Networking (TON)
Charging from sampled network usage

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Flow sampling under hard resource constraints

Proceedings of the joint international conference on Measurement and modeling of computer systems
Building a better NetFlow

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A robust system for accurate real-time summaries of internet traffic

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimating arbitrary subset sums with few probes

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling algorithms in a stream operator

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Estimating flow distributions from sampled flow statistics

IEEE/ACM Transactions on Networking (TON)
The DLT priority sampling is essentially optimal

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Confidence intervals for priority sampling

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Impact of packet sampling on anomaly detection metrics

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Is sampled data sufficient for anomaly detection?

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Optimal combination of sampled network measurements

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
The power of slicing in internet flow measurement

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Sketching unaggregated data streams for subpopulation-size queries

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing top k queries from samples

CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Computational challenges in parsing by classification

CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

Composable, scalable, and accurate weight summarization of unaggregated data sets

Proceedings of the VLDB Endowment
Do you know your IQ?: a research agenda for information quality in systems

ACM SIGMETRICS Performance Evaluation Review
Review: A survey of network flow applications

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Measurement, collection, and interpretation of network usage data commonly involves multiple stage of sampling and aggregation. Examples include sampling packets, aggregating them into flow statistics at a router, sampling and aggregation of usage records in a network data repository for reporting, query and archiving. Although unbiased estimates of packet, bytes and flows usage can be formed for each sampling operation, for many applications it is crucial to know the inherent estimation error. Previous work in this area has been limited mainly to analyzing the estimator variance for particular methods, e.g., independent packet sampling. However, the variance is of limited use for more general sampling methods, where the estimate may not be well approximated by a Gaussian distribution. This motivates our paper, in which we establish Chernoff bounds on the likelihood of estimation error in a general multistage combination of measurement sampling and aggregation. We derive the scale against which errors are measured, in terms of the constituent sampling and aggregation operations. In particular this enables us to obtain rigorous confidence intervals around any given estimate. We apply our method to a number of sampling schemes both in the literature and currently deployed, including sampling of packet sampled NetFlow records, Sample and Hold, and Flow Slicing. We obtain one particularly striking result in the first case: that for a range of parameterizations, packet sampling has no additional impact on the estimator confidence derived from our bound, beyond that already imposed by flow sampling.