Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Finding Frequent Items in Sliding Windows with Multinomially-Distributed Item Frequencies
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
How to scalably and accurately skip past streams
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Efficient packet sampling for accurate traffic measurements
Computer Networks: The International Journal of Computer and Telecommunications Networking
Modeling conservative updates in multi-hash approximate count sketches
Proceedings of the 24th International Teletraffic Congress
Scalable identification and measurement of heavy-hitters
Computer Communications
Hi-index | 0.00 |
We present two algorithms to the problem of identifying and measuring heavy-hitters. Our schemes report, with high probability, those flows that exceed a prescribed share of the traffic observed so far; along with an estimate of their sizes. One of the biggest advantages of our schemes is that they entirely rely on sampling. This makes them flexible and lightweight, permits implementing them in cheap DRAM and scale to very high speeds. Despite sampling, our algorithms can provide very accurate results and offer performance guarantees independent of the traffic mix. Most remarkably, the schemes are shown to require memory that is constant regardless of the volume and composition of the traffic observed. Thus, besides computationally light, cost-effective and flexible, they are scalable and robust against malicious traffic patterns. We provide theoretical and empirical results on their performance; the latter, with software implementations and real traffic traces.