Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Finding Frequent Items in Sliding Windows with Multinomially-Distributed Item Frequencies
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Adaptive shared-state sampling
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
How to scalably and accurately skip past streams
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Entropy based adaptive flow aggregation
IEEE/ACM Transactions on Networking (TON)
Parallelizing weighted frequency counting in high-speed network monitoring
Computer Communications
Hi-index | 0.24 |
Existing methods to detect and measure heavy-hitters (frequent items) are either lightweight but too inaccurate and memory-demanding (e.g. those relying on sampling), or too heavyweight to be deployed at high speeds. In this paper, we present several sampled-based algorithms to the problem and show that they exhibit two critical features. First, despite sampling, our schemes provide accurate results and detection guarantees that are independent of the traffic properties. Second, they are provably shown to require memory that is not only constant regardless of the amount of traffic observed and its composition, but a small factor above the theoretical minimum. Thus, unlike most solutions, ours scale in both space and speed; the use of sampling allowing to trade off performance for cost. As we will see, our algorithms build on similar principles. The first two use a constant sampling probability. Upgrading the second to support a variable sampling rate and to adjust it depending on the traffic intensity and CPU available yields our third scheme; a highly versatile solution that performs quasi-optimally and requires minimal configuration.