SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Concrete Math
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Efficient implementation of a statistics counter architecture
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimating flow distributions from sampled flow statistics
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Flow sampling under hard resource constraints
Proceedings of the joint international conference on Measurement and modeling of computer systems
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A data streaming algorithm for estimating subpopulation flow size distribution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A robust system for accurate real-time summaries of internet traffic
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fisher information of sampled packets: an application to flow size estimation
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Bottom-k sketches: better and more efficient estimation of aggregates
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Sketching unaggregated data streams for subpopulation-size queries
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Composable, scalable, and accurate weight summarization of unaggregated data sets
Proceedings of the VLDB Endowment
Review: Passive internet measurement: Overview and guidelines based on experiences
Computer Communications
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Leveraging Zipf's law for traffic offloading
ACM SIGCOMM Computer Communication Review
Don't let the negatives bring you down: sampling from streams of signed updates
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Line speed accurate superspreader identification using dynamic error compensation
Computer Communications
Hi-index | 0.00 |
Statistical summaries of traffic in IP networks are at the heart of network operation and are used to recover information on the traffic of arbitrary subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. Cisco's sampled NetFlow, based on aggregating a sampled packet stream into flows, is the most widely deployed such system. We observe two sources of inefficiency in current methods. Firstly, a single parameter (the sampling rate) is used to control utilization of both memory and processing/access speed, which means that it has to be set according to the bottleneck resource. Secondly, the unbiased estimators are applicable to summaries that in effect are collected through uneven use of resources during the measurement period (information from the earlier part of the measurement period is either not collected at all and fewer counter are utilized or discarded when performing a sampling rate adaptation). We develop algorithms that collect more informative summaries through an even and more efficient use of available resources. The heart of our approach is a novel derivation of unbiased estimators that use these more informative counts. We show how to efficiently compute these estimators and prove analytically that they are superior (have smaller variance on all packet streams and subpopulations) to previous approaches. Simulations on Pareto distributions and IP flow data show that the new summaries provide significantly more accurate estimates. We provide an implementation design that can be efficiently deployed at routers.