Inferential time-decaying Bloom filters
Proceedings of the 16th International Conference on Extending Database Technology
Hi-index | 0.00 |
A probabilistic data stream $S$ is defined as a sequence of uncertain tuples $,i=1...\infty$, with the semantics that element $t_i$ occurs in the stream with probability $p_i \in (0,1)$. Thus each distinct element $t$, which occurs in tuples of $S$, has an existential probability based on the tuples: $ \in S$. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in $S$, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF [1], which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size $W$ and floating counter number $N$, for any $t$ which occurs in the past sliding window, our method outputs the accurate existential probability of $t$ with probability $1-(1/2)^{ln(2)*N/W}$. Our experimental results on the synthetic data verify the effectiveness of our approach.