Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Exact and approximate membership testers
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Journal of Algorithms
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Approximately detecting duplicates for streaming data using stable bloom filters
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
On the evolution of user interaction in Facebook
Proceedings of the 2nd ACM workshop on Online social networks
An Optimal Bloom Filter Replacement Based on Matrix Solving
CSR '09 Proceedings of the Fourth International Computer Science Symposium in Russia on Computer Science - Theory and Applications
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set A of n items and an additional item x from the same universe u of a size m ≫ n, we want to distinguish whether x ∈ A or not, using small (limited) space. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1 + o(1))n log 1/δ bits, which only allows for a small false positive δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size w ≤ n. Our data structure only requires O(n(log 1/δ + log n)) bits and O(1) running time.