Modeling conservative updates in multi-hash approximate count sketches

Authors:
Giuseppe Bianchi;Ken Duffy;Douglas Leith;Vsevolod Shneer
Affiliations:
CNIT/Univ. Roma Tor Vergata, Italy;NUIM Hamilton Institute, Ireland;NUIM Hamilton Institute, Ireland;Heriot-Watt University, UK
Venue:
Proceedings of the 24th International Teletraffic Congress
Year:
2012

Citing 10
Cited 0

Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
New directions in traffic measurement and accounting

ACM SIGCOMM Computer Communication Review
Catching Accurate Profiles in Hardware

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The power of two choices in randomized load balancing

The power of two choices in randomized load balancing
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Adaptive shared-state sampling

Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Measurement data reduction through variation rate metering

INFOCOM'10 Proceedings of the 29th conference on Information communications
Sketching techniques for large scale NLP

WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-hash-based count sketches are fast and memory efficient probabilistic data structures that are widely used in scalable online traffic monitoring applications. Their accuracy significantly improves with an optimization, called conservative update, which is especially effective when the aim is to discriminate a relatively small number of heavy hitters in a traffic stream consisting of an extremely large number of flows. Despite its widespread application, a thorough understanding of the conservative update operation has lagged behind, perhaps because of the significant modeling complexity involved. In this work we attempt to fill this gap. Our proposed modeling approach builds on a practically important empirical finding: simulation results (as well as experimental ones over real traffic traces) obtained for skewed load scenarios exhibit a sharp waterfall-type behaviour. That is, the approximate count provided by the sketch response remains accurate until an "error floor" is reached. Flows below this error flow level are on average approximated by the same error floor count value, irrespective of their exact count. The error floor itself appears to be maximal in the case of uniform load. Leveraging the simplifications made possible when the load is uniform, we derive an analytic model capable of accurately predicting the transient growth behavior of the (tightly correlated) counters deployed in the data structure and obtain an upper bound on the error floor level.