Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
New directions in traffic measurement and accounting
ACM SIGCOMM Computer Communication Review
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The power of two choices in randomized load balancing
The power of two choices in randomized load balancing
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Adaptive shared-state sampling
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Measurement data reduction through variation rate metering
INFOCOM'10 Proceedings of the 29th conference on Information communications
Sketching techniques for large scale NLP
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Hi-index | 0.00 |
Multi-hash-based count sketches are fast and memory efficient probabilistic data structures that are widely used in scalable online traffic monitoring applications. Their accuracy significantly improves with an optimization, called conservative update, which is especially effective when the aim is to discriminate a relatively small number of heavy hitters in a traffic stream consisting of an extremely large number of flows. Despite its widespread application, a thorough understanding of the conservative update operation has lagged behind, perhaps because of the significant modeling complexity involved. In this work we attempt to fill this gap. Our proposed modeling approach builds on a practically important empirical finding: simulation results (as well as experimental ones over real traffic traces) obtained for skewed load scenarios exhibit a sharp waterfall-type behaviour. That is, the approximate count provided by the sketch response remains accurate until an "error floor" is reached. Flows below this error flow level are on average approximated by the same error floor count value, irrespective of their exact count. The error floor itself appears to be maximal in the case of uniform load. Leveraging the simplifications made possible when the load is uniform, we derive an analytic model capable of accurately predicting the transient growth behavior of the (tightly correlated) counters deployed in the data structure and obtain an upper bound on the error floor level.